THE ANNALS 
of 
MATHEMATICAL 


STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


A Class of Statistics with Asymptotically Normal Distribution. 
Wassity HorEFFDING 

Optimum Character of the Sequential Probability Ratio Test. 
Wap anp J. Wo.LFow!Tz 

Limiting Distribution of a Root of a Determinantal Equation. D. 


On a Source of Downward Bias in the Analysis of Variance and 
Covariance. Witt1amM G. Mapow 

Mixture of Distributions. Hrrsert Rospsins 

Some Applications of the Mellin Transform in Statistics. 
JAMIN EPSTEIN 

The Estimation of Linear Trends. G. W. Hovusner anp J. F. 


On the Effect of Decimal Corrections on Errors of Observation. 
Puitriep HARTMAN AND AUREL WINTNER 
Weighing Designs and Balanced Incomplete Blocks. K. S. 


Bounds for Some Functions Used in Sequentially Testing the Mean 
ie of a Poisson Distribution. Leon H. Hersacu 
otes: 


The Distribution of Student’s ¢t when the Population Means are Un- 
equal. HERBERT ROBBINS 
A Distribution-Free Confidence Interval for the Mean. Louis Gurr- 


On the Compound and Generalized Poisson Distributions. E. Can- 
SADO MAcEDA 

On Confidence Limits for Quantiles. Gorrrrizp E. NorETHER 

A Lower Bound for the Expected Travel Among m Random Points. 


A Matrix Arising in Correlation Theory. H.M. Bacon 

Table of Normal Probabilities for Intervals of Various Lengths and 
BORGES, Wen Da ME Fd dbs 6 bd bi dp oo ded onda gM emcees 

Correction to ‘A Note on the Fundamental Identity of Sequential 
Analysis”. G. E. A 

Correction to “On the Charlier Type B Series’’. 


Abstracts of Papers 


News and Notices 
Report on the Berkeley Meeting of the Institute 


Vol. XIX, No. 3 — September, 1948 





Insurance THE ANNALS 
Library 


AR OF MATHEMATICAL STATISTICS 


A 
Ab 


Usp ' 


EDITED BY 
S. S. WILKS, Editor 


M. 8. BARTLETT HARALD CRAMER J. NEYMAN 

WILLIAM G. COCHRAN W. EDWARDS DEMING WALTER A. SHEWHART 
ALLEN T. CRAIG J. L. DOOB JOHN W. TUKEY 

C. C. CRAIG W. FELLER A. WALD 


HAROLD HOTELLING 


WITH THE COOPERATION OF 


T. W. ANDERSON, JR. CHURCHILL EISENHART H. B. Mann 

Davip BLacKWELL M. A. GirsHIcK ALEXANDER M. Moop 

J. H. Curtiss Pavut R. Hatmos FREDERICK MOosrTeLLER 

J. F. Daty Paut G. Hore. H. E. Rossins 

Haro.tp F. DopGeE Marx Kac Henry ScHEFrré 

Pau. 8. DwYER E. L. LEHMANN JacoB WoLFOWITZ 
Wiiiiam G. Mapow 


The ANNALS oF MarHematicaL Statistics is published quarterly by the | 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 
Md. Subscriptions, renewals, orders for back numbers and other Tessie com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt, | 
Royal & Guilford Aves, Baltimore 2, Md., or to the Secretary of the Insti-” 
tute of Mathematical Statistics, P. S. Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 


Changes in mailing address which are to become effective for a given 4 
issue should be reported to the Secretary on or before the 15th of the 7 


month preceding the month of that issue. The months of issue are March, § 
June, September and December. i 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS | 


should be sent to 8. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts § 
should be typewritten dedile-epeeed with wide margins, and the original copy 7 


should be submitted. Footnotes should be reduced to a minimum and whenever | 


possible replaced by a bibliography at the end of the paper; formulae in foot- | 


notes should be avoided. Figures, charts, and diagrams should be drawn on © 


plain white paper or tracing cloth in black India ink twice the size they are to | 
be printed. Authors are requested to keep in mind typographical difficulties 7 
of complicated mathematical formulae. 4 


Authors will ordinarily receive only galley proofs. Fifty reprints without _ 


covers will be furnished free. Additional reprints and covers furnished at cost. 7 


The subscription price for the ANNALS is $8.00 inside the Western Hemi- 7 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available ~ 


at $8.00 per volume or $3.00 per single issue. 
CoMPOSED AND PRINTED AT THE 


WAVERLY PRESS, Inc. 
Bautimore, Mp., U. S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879 


: 





lin ccaatielipelg 
ane REN Rah a oo Rapier tbedipete 
tts i Vapingins Rae teas 





“4 om 


A CLASS OF STATISTICS WITH ASYMPTOTICALLY NORMAL 
DISTRIBUTION' 


By Wassity HoEFFpDING 


‘Institule of Statistics, University of North Carolina 


1. Summary. Iet X,,---,X, be m independent random _ vectors, 
X,= (XO? ee) NS?) and @(a1, -+* 5 am) a function of m(<n) vectors 2, = 
(2°, wee a) A statistic of the form U = w"O(X a, , °° » Xa,)/n(n — 1) 

(n — m + 1), where the sum 2” is extended over all permutations 
(a;,°** ,@m) Of m different integers, 1 < a; < n, is called a U-statistic. If 
Xi,°°:,X, have the same (cumulative) distribution function (d.f.) F(x), U is an 


unbiased estimate of the population characteristic 0(F) = | tee [ee os" 0 


dF (a) --: dF(a»). OCF) is called a regular functional of the df. F(x). 
Certain optimal properties of U-statistics as unbiased estimates of regular func- 
tionals have been established by Halmos [9] (cf. Section 4). 

The variance of a U-statistic as a function of the sample size n and of certain 
population characteristics is studied in Section 5. 

It is shown that if X,,--- , X, have the same distribution and (a , +--+ , Xn) 
is independent of n, the d.f. of ~/n(U — @) tends to a normal d.f. as n > © 
under the sole condition of the existence of E@°(X,,--- , X,,). Similar results 
hold for the joint distribution of several U-statistics (Theorems 7.1 and 7.2), 
for statistics U’ which, in a certain sense, are asymptotically equivalent to U 
(Theorems 7.3 and 7.4), for certain functions of statistics U or U’ (Theorem 7.5) 
and, under certain additional assumptions, for the case of the X,’s having dif- 
ferent distributions (Theorems 8.1 and 8.2). Results of a similar character, 
though under different assumptions, are contained in a recent paper by 
von Mises [18] (ef. Section 7). 

Examples of statistics of the form U or U’ are the moments, Fisher’s k-statis- 
tics, Gini’s mean difference, and several rank correlation statistics such as Spear- 
man’s rank correlation and the difference sign correlation (ef. Section 9). 
Asymptotic power functions for the non-parametric tests of independence based 
on these rank statistics are obtained. They show that these tests are not un- 
biased in the limit (Section 9f). The asymptotic distribution of the coefficient 
of partial difference sign correlation which has been suggested by Kendall also 
is obtained (Section 9h). 


2. Functionals of distribution functions. Let F(x) = F(xz™,---, 2) be 
an r-variate d.f. If to any F belonging to a subset 9) of the set of all d.f.’s in the 
r-dimensional Euclidean space is assigned a quantity @(/), then @(F) is called a 


1 Research under a contract with the Office of Naval Research for development of multi- 
variate statistical theory. 


293 


mse te see * Th 


SUATiaeT FEE EY 











294 WASSILY HOEFFDING 


functional of F, defined on 9). In this paper the word functional will always 
mean functional of a df. 

An infinite population may be considered as completely determined by its 
d.f., and any numerical characteristic of an infinite population with d.f. F that 
is used in statistics is a functional of Ff. A finite population, or sample, of size n 
is determined by its d.f., S(x) say, and its size n. n itself is not a‘functional of § 
since two samples of different size may have the same df. 


: 1 - 5 ‘ aul 
If S(xz™, --- , 2) is the df. of a finite population, or a sample, consisting 
of n elements 
a (1) (rn) _— 
(2.1) Le = (46, °** » Za ds (a = 1,---,n), 


1 ° . 
then nS(z™, «++ , x“) is the number of elements x. such that 
1) 1 
e” <a", +++, Se”. 


Since S(xz™, «++ , 2) is symmetric in 2, --- , 2, , and retains its value for a 
sample formed from the sample (2.1) by adding one or more identical samples, 
the same two properties hold true for a sample functional @(S). Most statistics 
in current use are functions of n and of functionals of the sample df. 


A random sample {X,,--- , X,} isa set of n independent random vectors 
, >(1) r(r) ” 
(2.2) Xe = (Xa ,°**, Xe ’ (a = 1,---,n). 
> 1 Y (1) 
For any fixed values x”, --- , x, the d.f. S(x™, --- , x) of a random sample 


is a random variable. The functional 00S), where S is the d.f. of the random 
sample, is itself a random variable, and may be called a random functional. 

A remarkable application of the theory of functionals to functionals of d.f.’s 
has been made by von Mises [18] who considers the asymptotic distributions of 
certain functionals of sample d.f.’s. (Cf. also Section 7.) 


3. Unbiased estimation and regular functionals. Consider a functional 


@ = 6(F) of the r-variate d.f. F(z) = F(a™, --- , 2), and suppose that for some 
sample size n, @ admits an unbiased estimate for any df. F in. That is, if 
X,,°°: ,X, aren independent random vectors with the same d.f. F’, there exists 
a function ¢(2,, +--+ , %,) of n vector arguments (2.1) such that the expected 
ralue of o(X1,--- , Xn) is equal to @(F), or 
(3.1) j ae | g(r, ie Ln) dF (x1) ae dF (xp) = 6(F) 
for every F in. Here and in the sequel, when no integration limits are indi- 
cated, the integral is extended over the entire space of 71, --- ,2,. The integral 
is understood in the sense of Stieltjes-Lebesgue. 

The estimate g(a: , --+ , 2.) of 0(F) is called unbiased over SD. 


A functional 6(F) of the form (3.1) will be referred to as regular over 9D? 


1 This is an adaptation to functionals of d.f.’s of the term ‘‘regular functional’’ used by 
Volterra [21]. 





Vays 


’ its 
that 
Ze n 
of § 


ting 


, n) 


or a 
les, 
ties 


iple 
lom 


£.’s 


mal 
yme 
3, if 
ists 


ted 


idi- 


ral 


1 by 


A CLASS OF STATISTICS 295 


Thus, the functionals regular over 9) are those admitting an unbiased estimate 
over SD). 

If 6(F) is regular over 9), let m(<n) be the smallest sample size for which there 
exists an unbiased estimate ®(a , --+ , X») of 6 over YD: 


3.2) a(F) = [+ [a , +++, 2m) da) +++ dF em) 


for any F in. Then m will be called the degree over D of the regular func- 
tional 6(F). 

If the expected value of 9(X1,--- , X,) is equal to 0(F) whenever it exists, 
g(t, °** » Xn) Will be called a distribution-free unbiased estimate (d-f. u.e.) of 6(F). 
The degree of 0(F’) over the set Sy of d.f.’s F for which the right hand side of (3.1) 
exists will be simply termed the degree of 0(F). 

A regular functional of degree 1 over 9 is called a linear regular functional 
over S. If (7) has the same value for all F in 9), 6(F') may be termed a regular 
functional of degree zero over 9). 

Any function ®(2 , +++ , %m) satisfying (8.2) will be referred to as a kernel of 
the regular functional 6(F). 

For any regular functional @(F) there exists a kernel o(x1 , +++ , Ym) Symmetric 


Inti, °'',%m. For if (x, --+ , 2m) isa kernel of @(F), 
1 
(3.3) (21, °°° , Im) = = Wan, 2 *** ¢ Reads 
where the sum is taken over all permutations (a1, --- , @m) of (1,---,m),isa 


symmetric kernel of @(F). 

If 6,(F) and @.(F) are two regular functionals of degrees m; and mz over 9), 
then the sum 6,(F') + 6.(F) and the product 6,(F)6.(F) are regular functionals 
of degrees <m = Max (mj, m2) and <m, + mz, respectively, over Y. For if 
@(x,, +--+ ,%m,;) is a kernel of 6,(F), (¢ = 1, 2), then 


a(F) + 0(F) = ff fir, +++ 5 4m) + Bla, +++, tm)} 


dF (a) --+ dF (2m) 


and 


6,(F)6o(F) = | ae | $i (21 , vs a Xm) P2( Xm 441 eo Lony+mg) 


dF (21) +++ AF (Xm,+ms)- 


More generally, a polynomial in regular functionals is itself a regular functional. 


Examples of linear regular functionals are the moments about the origin, 


iirc nabie = | Tr | (2)! i (a)"" dF (x — a). 





vast se - ts 


BS emesase seers tee 2 te 


296 WASSILY HOEFFDING 


A moment about the mean is a polynomial in moments y’ about 0, and hence g 
regular functional over the set 4) of d.f.’s for which it exists (cf. Halmos [9}). 
For instance, the variance of X“”, 


= [ar a x x ) dF (x) dF (xs? 


is a regular functional of degree 2. A symmetrical kernel of o” is (x — 2”? /2. 
If % is the set of univariate d.f.’s with mean y» and existing second moment, 
o isa linear regular functional of F over D, since then we have 


¢= | (x{” — p)* dF(x;”). 


The function 


1 ae = (x? = af)? ‘on 1 > (2 a -> os?) 
a B 


~ nin — 1) Se 2 n—1 


° . ‘ ‘ ‘ . 2 i 
is a distribution-free unbiased estimate of o°. The function 


C5) Vv G) 


is known to be an unbiased estimate of o over the set of univariate normal d.f.’s, 
but it is not a d.-f. u.e. 


4. U-statistics. Let 2.,---,2, be a sample of mn vectors (2.1) and 


®(11, °° , tm) a function of m(<n) vector arguments. Consider the function 
of the sample, 


1 
y = eee GE. ine EE er os 6 
(4.1) U U(a1 ’ ? Za) n(n 1) 2 -(n 1) P(Xa, ,’ , Sina), 


>” stands for summation over all permutations (a; , --- , am) of m integers 


where > 
such that 


(4.2) l<a;<n, a; A a;ifi $j, (i,j = 1,--+,m). 


U is the average of the values of ® in the set of ordered subsets of m members 
of the sample (2.1). U issymmetric in 7, -+: ,%,. 

Any statistic of the form (4.1) will be called a U-statistic. Any function 
@(21,-** , Um) Satisfying (4.1) will be referred to as a kernel of the statistic U. 

If (2, , --+ , Xm) is a kernel of a regular functional 6(F) defined on a set 9, 
then U is an unbiased estimate of 6(F) over 9): 


(43) 6(F) = | a | U(ar, «++, tn) dF (ay) «++ dF (an) 


for every F in D 








nce a 


(9). 


)?/2, 


nent, 


and 
tion 


tion 
> Uz 
YX 


} 
























A CLASS OF STATISTICS 297 


For n = m, U reduces to the symmetric kernel (3.3) of 6(F). 
From a recent paper by Halmos [9] it follows for the case of univariate d.f.’s 
(r = i): 
Tf 6(F) is a regular functional of degree m over a set D containing all purely 
discontinuous d.f.’s, U is the only unbiased estimate over 9) which is symmetric 
in %1,°°* , t2, and U has the least variance among all unbiased estimates 
over 9). 

These results and the proofs given by Halmos can easily be extended to the 
multivariate case (r > 1). 
Combining (3.3) and (4.1) we may write a U-statistic in the form 

m 


-1 
. n - 
(4.4) U (x1, ey Ln) = ( ) a $o(ra, pare fa) 
where the kernel @) is symmetric in its m vector arguments and the sum ¥’ is 
extended over all subscripts a such that 


lsa<az<s++ Cam <0. 





Another statistic frequently used for estimating 0(F) is 0(S), where S = S(zx) 
is the d.f. of the sample (2.1). If S is substituted for F in (3.2), we have 


1 n 
(4.5) “8) = — ) > ie, «+ ted 


In particular, the sample moments have this form; their kernel @ is obtained 
by the method described in section 3. 
Ifm = 1,008) =U. Ifm=2 
~ 1 fl \ 
a(S) = a ce + - ri (1a, La) (9 
n 3 a=l1 ) 

and 6(S) is a linear function of U-statistics with coefficients depending on n. 
This is easily seen to be true for any m. In general @(S) is not an unbiased esti- 
mate of 6(F). If, however, the expected value of 6(S) exists for every F in 9), 
we have 


E{0(S)| = 6(F) + O(n”), 


and the estimate @(S) of @(F) may be termed unbiased in the limit over 9). 

Numerous statistics in current use have the form of, or can be expressed in 
terms of U-statistics. From what was said above about moments as regular 
functionals, it is easy to obtain U-statistics which are d.-f. u.e.’s of the moments 
about the mean of any order (cf. Halmos [9]). Fisher’s k-statistics are U-statis- 
tics, as follows from their definition as unbiased estimates of the cumulants, 
symmetric in the sample values. Another example is Gini’s mean difference 


1 OD _ 























298 WASSILY HOEFFDING 








More examples, in particular of rank 


correlation statistics, will be given jp 
section 9. 






5. The variance of a l-statistic. Let X1,---, X, bem independent random 
vectors with the same d.f. F(x) = F(x’, «++ , 2°”), and let 


m 


pets 
(5.1) U = UCN, go PS aa = (") >’ ®(Xa,; re Riad, 





where ®(a; , +--+ , 2») is symmetric in 2, +++ , %» and S’ has the same meaning 
as in (4.4). Suppose that the function ® does not involve n. 
If 6 = @(F) is defined by (3.2), we have 


E\U} = E{@(X,,--- , Xm)} = 0. 










Let 








(5.2) & (21 5 ea )= BE} P(x, 5 °** gies Aes usd he (c = Rs a m), 














where 2) , °°* , a are arbitrary fixed vectors and the expected value is taken with 
respect to the random vectors Y.41,°°: , Xm. Then 









(5.3) i Maas ©" «Me 4) = EE‘, (a gS O gee X.)}, 


and 


(5.4) 








E{@(X;,,°°° , X-)} = 4, (c = 1,---> , m). 


Define 







(5.5) V(2x1, 


(5.6) 


Pn) = OR ,*+ p20) — 8, 


Vay, **+, 2.) = (ti, - 








We have 








(5.7 Wo (11 a RO Ue-1) = ESW.(x go POS 5 Mgedis Rett, 


(58) Efv(X,,---,X.)} = Efw(X1,°°-,Xa)} =0, (ec =1,---,m). 


\ 















Suppose that the variance of V.(X,, +--+ , X,) exists, and let 





(5.9) &=0, be = Efwi(Xi,---,X)}, (ec =1,--:,m). 


We have 





(5.10) c. = Ej@2(X,,---,X)} — @. 





¢. = ¢.(F) is a polynomial in regular functionals of Ff’, and hence itself a regular 
functional of F (of degree < 2m). 

If, for some parent distribution F = Fy) and some integer d, we have ¢a(/o) = 0, 
this means that V.(X,,--- , Xz) = 0 with probability 1. By (5.7) and (5.9), 
ta = O implies &) = --- = fa = 0. 










A CLASS OF STATISTICS 299 


If «:(Fo) = 0, we shall say that the regular functional @(F) is stationary® 
for F = Fo. If 


(5.11) Oi(Fo) = --+ = fa(Fo) = 0, Caii(Fo) > 0, (l<d<m), 


9(F) will be called stationary of order d for F = Fy. 
If (a1, -** , @m) and (8, , «++ , Bm) are two sets of m different integers, 1 < a; , 


8; <n, and ¢ is the number of integers common to the two sets, we have, by the 
symmetry of , 


(5.12) E{W(Xa, °°? » Xam) ¥(Xs, , °° 


If the variance of U exists, it is equal to 


—2 
o (U) = (”) E{z'V(Xa, no XJ 


™m 


n —2 ‘ . - ss : 
-— ( ) = DE U(Xe, pote » Xa,)¥(Xz, yt » Xs,,)}5 


m a 


where ~‘” stands for summation over all subscripts such that 


lia <a <s++ Cam SN, i<e2 <& <--> € 8a S ®, 
and exactly ¢ equations 


a; = B; 


are satisfied. By (5.12), each term in °° is equal to ¢.. The number of terms 
in S“° is easily seen to be 


s+ (n —Qm+e+]1) _ ”) n— ”’) ") 
cl(m — c)'(m — c)! c}/\m—ce m}? 


and hence, since fo = 0, 


(5.13) o(U) = () ps ("") @ pa 4 
When the distributions of X,,--- , X, are different, F,(x) being the d.f. of 
X,, let 
(5.14) Cini, © Bt Ras *** » Btad he 
Weinert mA Cr yee 5 ate) 


SC» 


(5.15) = E(e(2, pet peg a sd soe 9 X8m-c)} = Dorris erB1**Bm=e ’ 


(c =1,-::- ,m), 


3 According to the definition of the derivative of a functional (cf. Volterra [21]; for 
functionals of d.f.’s ef. von Mises [18]), the function m(m — 1)...(m—d+1) Wa(a1 ... 2a), 
which is a functional of F, is a d-th derivative of 6(F) with respect to F at the ‘“‘point” F 
of the space of d.f.’s. 











300 WASSILY HOEFFDING 


iii Nila, heen sh aaa 
(5.16) = BG tine ~stectta*“tagndoen a Rag) Wetegs<+<nsdeer-tegne 
(X ea **" sg Xa,)} 
(5.17) : c!(m — c)!(m — ec)! ioe 
o-1Lé) ica. = j - ae | gf ew ORns** “ig Bi 5 Ag taynrey 
n(n — 1) --- (n — 2m +ec4 1) 11° *9e)B1y***Bm—eiT19"*++¥ mame 


where the sum is extended over all subscripts a, 8, y such that 


ita_<---€a4Se0, 146,< +--+ € hi. S08, i Sm <> te Sa 


’ 


a; ~ B;, aF Yi; Bi ~ Yj. 


Then the variance of U is equal to 


— iii n\* & (m\ (rn — m 
(5.18) a) (”) ~ oI (" ~ ”) Sen « 


Returning to the case of identically distributed X’s, we shall now prove some 
inequalities satisfied by ¢:,--- , mand o(U) which are contained in the fol- 
lowing theorems: 

THEOREM 5.1 The quantilies €),-°++* , €» as defined by (5.9) satisfy the in- 
equalities 


(5.19) <2 2 : fwideecdcum 
c d 
THreorEeM 5.2 The variance o (U,) of a U-statistic U, = U(X, +--+, X,), 


where X,,-°-°- , Xx, are independent and identically distributed, satisfies the in- 
equalities 


mm 9 bs m 
(5.20) —h <P) < = £,. 


n nr 


no (U',) is a decreasing function of n, 
(5.21) (n + 10° (Una) < no(U,), 


which takes on its upper bound m¢,, for n = m and tends to its lower bound m >, 
as n increases: 


(5.22) o(Um) = <m, 


(5.23) lim no*(U,) = mo. 


iD 


If E\U,| = 0(F) is stationary of order >d — 1 for the d.f. of Xa, (5.20) may 
be replaced by 


(5.24) ro K,(m, d)ta < o (Un) < K,(m, Aim, 
( 


} = 


) 


Ym=ce 


’ 


yme 
fol- 


n- 


m. 


Sal, 


nay 


A CLASS OF STATISTICS 301 


where 


n\? & (m—1\/n — 
(5.25) K,(m, d) = (") =< “A + dl _ + 


We postpone the proofs of Theorems 5.1 and 5.2. 
(5.13) and (5.19) imply that a necessary and sufficient condition for the 
existence of o (U) is the existence of 


(5.26) om — E{#"(X,, vr ae Xm)} _ ge 


or that of E{#"*(X1,--- , Xm)}. 

If {: > 0, o (U) is of order n™. 

If 0(F) is stationary of order d for F = Fo, that is, if (5.11) is satisfied, o°(U) 
is of order n*-. Only if, for some F = Fy, 6(F) is stationary of order m, where 
m is the degree of 6(F), we have o (U) = 0, and U is equal to a constant with 
probability 1. 

For instance, if @(F)) = 0, the functional 6°(F) is stationary for F = Fy. 
Other examples of stationary ‘‘points”’ of a functional will be found in section 9d. 

For proving Theorem 5.1 we shall require the following: 

Lemma 5.1. If 


(5.27) 62 = fa — (‘) $a + @ fa2+++ + (-1)*" (, . :) oi, 


we have 


(5.28) ba = 0, (d = 1, +++ ,m)* 
and 
(5.29) ee | ee ee a. 

: ] d-—1 


Proor. (5.29) follows from (5.27) by induction. 
For proving (5.28) let 
9 - 9 i o ‘ ‘ 
nm = 0, te = E{@(X1,°°:,X-)}, (Cc =1,+--,m). 
Then, by (5.10), 


[e¢- me Ws 


and on substituting this in (5.27) we have 
d d 
d— 
a = ZZ (—1) p (“) Ne» 
c=0 


From (5.9) it is seen that (5.28) is true ford = 1. Suppose that (5.28) holds 
forl,---,d—1. Then (5.28) will be shown to hold for d. 














302 WASSILY HOEFFDING 


Let 
Po(21) = Pilar) — 94, ®.(01, 12, °°* y Leq1) 
= Boii(t1,°°* , Xe41) — Pete, °°* 5 Ves), (c= 1,-:-,d—1), 
For an arbitrary fixed 2 , let 
c(a1) = E{ Sela, X2, +++, Xeas)}, (c= 0,---,d—1). 


Then, by induction hypothesis, 
_ d—1 d a 1 
jean) = (0 (47 1) aden) > 0 


for any fixed 2. 
Now, 


E{neX1)} — Neti — Wes 


and hence 
. qt d—l d 
Bb) =D -V(4 7") en — 0d = ea (a= is 


The proof of Lemma 5.1 is complete. 
Proor oF THEOREM 5.1. By (5.29) we have fore < d 


E()s— «2 (7) 
(5.30) > |e oC a(%) | ” od (Gm 


From (5.28), and since (“) - a(“) > 0if1 <a<c<d, it follows that each | 


term in the two sums of (5.30) is not negative. This, in connection with (5.9) 


: 


Coa a dé. 


proves Theorem 5.1. : 
Proor oF THEOREM 5.2. From (5.19) we have 


oN sear ere 


‘~ i ce 
ec SS S— fn, (c = 1,--+, m). 
m 


Applying these inequalities to each term in (5.13) and using the identity 


~ (2) BeC)G=)- 5 


we obtain (5.20). 
(5.22) and (5.23) follow immediately from (5.13). 
For (5.21) w Wwe may write 





(5.32) D, = 0, 






A CLASS OF STATISTICS 


where 


D, = no (Un) — (n + 1)0°(Un41). 
* 1), Let 


Then we have from (5.13) 


, m\ (n — m\ (n\~ m 
(5.33) dae = Nn ("") (" et ) (") — (n+ 1) ”) 
n+1—m\fn+i1\" 
m—e m ' 
or 


/ —l 
dae = (”) P > ') (n — m+ 1° ") {(e — ln — (m — 1)*}, 


Cc ~~ < m 


by (l<c<m< 7). 


Putting 
q@=1+ =D nl 4 
0 n ? 


where [uv] denotes the largest integer < u, we have 


One < 0 if c < Co; 
fis > fe> ce. 
t each Hence, by (5.19), 
(5.9) 1 
Suats > _ Seq Clee, (e = 1, viare: m), 
Co 


| and 
| 


_, mM). 


Dn = : Feo 3 Cdn.c+ 


Co c=] 


sv (5.33) and (5.31), the latter sum vanishes. This proves (5.32). 
For the stationary case ¢) = --+ = f¢1 = 0, (5.24) is a direct consequence of 
(5.13) and (5.19). The proof of Theorem 5.2 is complete. 


6. The covariance of two U-statistics. Consider a set of g U-statistics, 


—1 
a (en) sw"(k... oe Xam(r))s (y=1,--: »9)s 








304 WASSILY HOEFFDING 


each U™ being a function of the same n independent, identically distributed 
random vectors X,,--:, Xn. The function @™ is assumed to be symmetric 
in its m(y) arguments (y = 1, °°: , g). 

Let 


E{U™} = E{@(X1, +++, Xm} = 0, (y = 1,-:- ,9); 
(6.1) UP, +, tm) = BPM, + mem) — OY, = = 1, +++, g); 
(6.2) WS (ar, +++, te) = ELUM (ar, +++ te, Xe, *** » Xmen}, 
(c = 1,-++,m(y);7 = 1,--+,9); 
(6.3) go"? = B(W$"(X1, +++ , X)WLO(K, «+ , Xe}, 
(y,6 =1,-++,9). 
If, in particular, y = 6, we shall write 


(6.4) GD) we pO  Biyl(X,, --- , X.)}?. 
Let 


au", y™) a E(u” at ¢*°yp™ i g)} 


: . (3 
be the covariance of U‘” and U™. 


In a similar way as for the variance, we find, if m (vy) < m (6), 


uy ata _( nm VI" (m(8)\ (rn — m6)\ Ve. 
(6.5) o(U™, U™) =(..0)) 2X ( ; Ca ” ’) . 


The right hand side is easily seen to be symmetric in y, 6. 
For y = 4, (6.5) is the variance of U™ (ef. (5.13)). 
We have from (5.23) and (6.5) 

lim no (U™) = m(y)ei”, 


r—>O 


lim no(U, U™) = m(y)m(aei. 


n—> oO 


Hence, if ¢{? ¥ 0 and ¢{” ¥ 0, the product moment correlation p(U™, U) 


between U™ and U tends to the limit 


= (7) 


: TQ Oy St. 
(6.6) ~ aaillelilas VO” 
$1 


7. Limit theorems for the case of identically distributed X.’s. We shall now 
study the asymptotic distribution of U-statistics and certain related functions. 
In this section the vectors Y, will be assumed to be identically distributed. An 
extension to the case of different parent distributions will be given in section 8. 

Following Cramér [2, p. 83] we shall say that a sequence of d.f.’s F1(2), 
F.(x), +--+ converges to a d.f. F(x) if lim F,,(x) = F(x) in every point at which 
the one-dimensional marginal limiting d_f.’s are continuous. 





A CLASS OF STATISTICS 305 


Let us recall (cf. Cramér [2, p. 312]) that a g-variate normal distribution is 
called non-singular if the rank r of its covariance matrix is equal to g, and singular 
ifr<g- 

The following lemma will be used in the proofs. 

Lemma 7.1. Let Vi, V2,--- be an infinite sequence of random vectors V, = 

--», V®), and suppose that the d.f. F,(v) of Vn tends to a d.f. F(v) as 
Let VSP" = VSP + d&”, where 
lim E{d”}* = 0, (y = 1,---,9)- 
Then the d.f. of Vi = (VO, --- , VW’) tends to F(v). 

This is an immediate consequence of the well-known fact that the d.f. of Vi 
tends to F(v) if dS” converges in probability to 0 (cf. Cramér [2, p. 299]), since 
the fulfillment of (7.1) is sufficient for the latter condition. 

THEOREM 7.1. Let X,, +--+ ,X, ben independent, identically distributed random 
vectors, 


X, «= (20, -+ 35°). (a = 1,-++,n). 
Let 
@” (xy , peas » Satyr) s (y _ i, Bee »9)s 


be g real-valued functions not involving n, e” being symmetric in its m(y) (<n) 
vector arguments La = (2 ne Se 25”), C oe 1, os m(y); “ 1, Pron g). 
Define 


=I 
(7.2) uy” = ( . ) ) By a” (Xa, pers Xaminy)s 


m(y) 


where the summation is over all subscripts such that 1 < a, < +++ < amy < Nn. 
Then, tf the expected values 
(7.3) 0 = E{O™(Xi, +++, Xm}, 
and 
(74) E{® (Xi, +++, Xm}, 
exist, the joint d.f. of 
J/n(U™ — 6), «++, Yn(U@ — 0) 


tends, asn — ©, to the g-variate normal d.f. with zero means and covariance matrix 
(m(y)m(6)¢{7"”), where ¢{7"” is defined by (6.3). The limiting distribution is 
non-singular if the determinant | ¢{7"” | is positive. 

Before proving Theorem 7.1, a few words may be said about its meaning and 
its relation to well-known results. 

For g = 1, Theorem 7.1 states that the distribution of a U-statistic tends, under 
certain conditions, to the normal form. For m = 1, U is the sum of n inde- 





306 WASSILY HOEFFDING 


pendent random variables, and in this case Theorem 7.1 reduces to the Centra] 
Limit Theorem for such sums. For m > 1, U is a sum of random variables 
which, in general, are not independent. Under certain assumptions about the 
function ®(7,,--- , Xm) the asymptotic normality of U can be inferred from 
the Central Limit Theorem by well-known methods. If, for instance, © is g 
polynomial (as in the case of the k-statistics or the unbiased estimates of mo. 
ments), U can be expressed as a polynomial in moments about the origin which 
are sums of independent random variables, and for this case the tendency to 
normality of U can easily be shown (cf. Cramér [2, p. 365]). 

Theorem 7.1 generalizes these results, stating that in the case of independent 
and identically distributed X.’s the existence of E(@(X,, -++ ,X,,)} is sufficient 
for the asymptotic normality of U. No regularity conditions are imposed on the 
function &. This point is important for some applications (cf. section 9). 

Theorem 7.1 and the following theorems of sections 7 and 8 are closely related 
to recent results of von Mises [18] which were published after this paper was 
essentially completed. It will be seen below (Theorem 7.4) that the limiting 
distribution of ~/n[U — 0(F)] is the same as that of ~/n[@(S) — @(F)] (cf. (4.5) 
if the variance of @(S) exists. @(S) is a differentiable statistical function in the 
sense of von Mises, and by Theorem I of [18], ~/n[@(S) — 0(F)] is asymptotically 
normal if certain conditions are satisfied. It will be found that in certain cases, 
for instance if the kernel @ of @ is a polynomial, the conditions of the theorems of 
sections 7 and 8 are somewhat weaker than those of von Mises’ theorem. 
Though von Mises’ paper is concerned with functionals of univariate d.f.’s only, 
its results can easily be extended to the multivariate case. 

For the particular case of a discrete population (where F is a step function), 
U and @(S) are polynomials in the sample frequencies, and their asymptotic 
distribution may be inferred from the fact that the joint distribution of the fre- 
quencies tends to the normal form (ef. also von Mises [18]). 

In Theorem 7.1 the functions ®°” (a, +++ , mc») are supposed to be sym- 
metric. Since, as has been seen in section 4, any U-statistic with non-symmetric 
kernel can be written in the form (4.4) with a symmetric kernel, this restriction 
is not essential and has been made only for the sake of convenience. Moreover, 
in the condition of the existence of E{#(X,,--- , Xm)}, the symmetric kernel 
may be replaced by a non-symmetric one. For, if is non-symmetric, and 4%) is 
the symmetric kernel defined by (3.3), B{®5(X1, --- , Xm)} is a linear combina- 
tion of terms of the form E{®(X., ,--: , Xa,) ® (Xs, ,°-* ,Xs,,)}, whose exist- 
ence follows from that of E{@°(X,, --- , Xm)} by Schwarz’s inequality. 

If the regular functional @(F) is stationary for F = F) , that is, iff = {1(Fo) =0 
(ef. section 5), the limiting normal distribution of ~/n(U — 8) is, according to 
Theorem 7.1, singular, that is, its variance is zero. As has been seen in section 
5, o(U) need not be zero in this case, but may be of some order n “, 
(c = 2,3,---, m), and the distribution of n”?(U — 0) may tend to a limiting 
form which is not normal. According to von Mises [18], it is a limiting dis- 
tribution of type c, (c = 2, 3,--: ). 





A CLASS OF STATISTICS 307 


According to Theorem 5.2, o°(U) exceeds its asymptotic value m’¢,/n for any 
finite n. Hence, if we apply Theorem 7.1 for approximating the distribution of 
U when n is large but finite, we underestimate the variance of U. For many 
applications this is undesirable, and for such cases the following theorem, which 
is an immediate consequence of Theorem 7.1, will be more useful. 

THrorEM 7.2. Under the conditions of Theorem 7.1, and tf 


a” > 0, (y =1,°+*,9), 
the joint d.f. of 
(u™ a 6) /o(U) st (U@ _ g) /a(U™) 


tends, as n —> ©, to the g-variate normal d.f. with zero means and covariance matrix 
rr", where 


(y) é (7,8) 
ar = li o(U™, U™) — o 


bese o(U™)o(U®) “ VJ ’ (y,6 = 1,-+-,9). 
Proor oF THEOREM 7.1. The existence of (7.4) entails that of 


SP = E{@(Xi, ++, Xawm)}? — (OY 


which, by (5.19), (5.20) and (6.6), is sufficient for the existence of 
yes, Sh, of (UM), andof gi < VeMe®. 


Now, consider the g quantities 


n 


y@ _ ao yp » ee = ].--- 

Sie de, W(Ke), y=1---59) 
where ¥{” (x) is defined by (6.2). Y,---, Y are sums of n independent, 
random variables with zero means, whose covariance matrix, by virtue of (6.3), is 
(7.5) fo(Y,Y)} = {m(y)m(6)si7"}. 


By the Central Limit Theorem for vectors (cf. Cramér [1, p. 112]), the joint d_f. 


1) . e 
of (Y,--- , Y™) tends to the normal g-variate d.f. with the same means and 
covariances. 


Theorem 7.1 will be proved by showing that the g random variables 
(7.6) Z? = V/n(U™ — 6), Gy =1,---,9) 
have the same joint limiting distribution as Y®,--- , Y®. 

According to Lemma 7.1 it is sufficient to show that 


(7.7) lim E(Z — y%/) = 0, (y=1,--- 


nm—»o 
For proving (7.7), write 


(7.8) E{z™ ae yr a E{zZy 4 nr’? A nin’). 











308 WASSILY HOEFFDING 


By (5.13) we have 


(7.9) E{Z™y? = no'(U™) = myst” + O(n), 
and from (7.5), 
(7.10) E{Y™) = mq)”. 


By (7.2) and (6.1) we may write for (7.6) 


—1 
(y) t x,y) oe 
ae La ne + eaeah 


and hence 


RizeY™| = ~ m(y) LS ) . Le E{ UL? (X)V™ (Xa = # Xamiy))}+ 


a=1 
The term 
E{Yy? (Xa)¥ (Xa, °°* » Xamin)} 
is = ¢{” if 
(7.11) Om =a Or @&=—a::: OF Amy =a 
and 0 otherwise. For a fixed a, the number of sets {ai,-*- , @mcy} such that 
l< a < +++ < amy < nand (7.11) is satisfied, is . i Thus, 


—1 
iy ay aMyMy — s-— i (y) _ .2 (y) | 
@7iZ) Ei ZY} = m(y) (, os n — 7 1) = m(y)hi 


On inserting (7.9), (7 10), and (7.12) in (7.8), we see that (7.7) is true. 

The concluding remark in Theorem 7.1 is a direct consequence of the definition 
of a non-singular distribution. The proof of Theorem 7.1 is complete. 

Theorems 7.1 and 7.2 deal with the asymptotic distribution of U™, --- , U®, 
which are unbiased estimates of 0, --- , 0. The unbiasedness of a statistic 
is, of course, irrelevant for its asymptotic behavior, and the application of Lemma 
7.1 leads immediately to the following extension of Theorem 7.1 to a larger class 
of statistics. 

THEOREM 7.3. Let 


to)" y _ 
(7.13) U (g) — rg) 4 On | a (y — 1, tee, q), 
n 
where U™ is defined by (7.2) and b§” is a random variable. If the conditions of 
Theorem 7.1 are satisfied, and lim E{bSY}’ = 0, (y = 1, +--+, g), then the joint 
distribution of 
a/n(U% — 9), VV n(U a 9°) 


tends to the normal distribution with zero means and covariance matrix 
6 
fm(y)m(6)ei7"? }. 





— 


A CLASS OF STATISTICS 309 


This theorem applies, in particular, to the regular functionals 6(S) of the 
sample d.f., 


0(S) = —— +> - ss #(X.,, roe Xam) 


ek a@m=1 


in the case that the variance of 6(S) exists. For we may write 
n™6(S) = @) U + =* b(Xa,, aoe Kaas 


where the sum >* is extended over all m-tuplets (a; , --+ , am) in which at least 
one equality at = a;(i ¥ j) is satisfied. The number of terms in >* is of order 


m—1 


n” . Hence 
is 1 
6(S) —_ U = D, 
n 


where the expected value E{D*}, whose existence follows from that of o”{6(S)}, 


is bounded for n — x. Thus, if we put U‘”’ = (8S), the conditions 
of Theorem 7.3 are fulfilled. We may summarize this result as follows: 
THEOREM 7.4. Let X,,--- , X, be a random sample from an r-variate popula- 


tion with d.f. F(x) = F(a, «++ , 2°”), and let 


6°” (F) = |-. fa, 2+ Lm) GF (a1) «> dF(tm”), (y = 1,---,9), 


be g regular functionals of F, where ®”’ (x1, +++ , Xmcy)) 18 Symmetric in the vectors 
1, °** 5 Umcy) and does not involve n. If S(x) is the d.f. of the random sample, 
and if the variance of 


(= >. >. 


a” « a= 


am(y)=1 


exists, the joint d.f. of 
V/nfo(S) — 0° (F)}, «++ A nfa(s) — 0 (F)} 
tends to the g-variate normal d.f. with zero means and covariance matrix 
{m(y)m(8)g1""”}. 


The following theorem is concerned with the asymptotic distribution of a 
function of statistics of the form U or U’. 


r a - r(1)’ U 
BOREM 7.5. Le = (U"’,:-- ©") be a random vector, where U'””’ 
CHE Let (U’) ell . 
is defined by (7.13), and suppose that the aes of Theercm 7.3 are satisfied. 
. a ( . . . 
If the function h(y) = h(y*’, --- , y“*’) does not involve n and is continuous together 


with its second order partial derivatives in some neighborhood of the point (y) = (@) = 
(0°, --- | 6), then the distribution of the random variable ~/n{h(U’) — h(@)} 
tends to the normal distribution with mean zero and variance 


yy SD mly J m(8) (2). ee. oa), 


oon ing s— ay 








310 WASSILY HOEFFDING 


Theorem 7.5 follows from Theorem 7.3 in exactly the same way as the theorem 
on the asymptotic distribution of a function of moments follows from the fact 
of their asymptotic normality; cf. Cramér [2, p. 366]. We shall therefore omit 
the proof of Theorem 7.5. Since any moment whose variance exists has the 
form U’ = @(S) (ef. section 4 and Theorem 7.4), Theorem 7.5 is a generalization 
of the theorem on a function of moments. 


8. Limit theorems for U(X,,---, X,) when the X,.’s have different distri- 
butions. ‘The limit theorems of the preceding section can be extended to the 
case when the X,’s have different distributions. We shall only prove an exten- 
sion to this case of Theorem 7.1 (or 7.2), confining ourselves, for the sake of 
simplicity, to the distribution of a single U-statistic. 

The extension of Theorems 7.3 and 7.5 with g = 1 to this case is immediate. 
One has only to replace the reference to Theorem 7.1 by that to the following 
Theorem 8.1, and @ and ¢; by E{U} and &,,. 

THEOREM 8.1. Let X1,---, X, be n independent random vectors of r com- 
ponents, Xq having the d.f. Fa(z) = Fala, +++, a). Let (a1, +++ , tm) bea 
function symmetric in its m vector arguments x3 = (xg, ++ , xf) which does not 
involve n, and let 


-1 
(8.1) Vu» (t) = (” 4 2 Prsa.---0n=1) (v= 1,---,n), 
) 


m— 1 tone 


where WV is defined by (5.15), and the summation is exiended over all subscripts a 
such that 


la <ae< ++ < ami <7, a: Fv, (¢ = 1, ++. 


Suppose that there 1s a number A such that for every n = 1, 2, --- 
(8.2) | sail | ® (21, none Se) AF «, (x1) ers AF a,, (Xm) q A, 


(Ilga<Sm<-:-: 
that 
(8.3) E | Vi((X,) | < e, (v 


and 


n ad ; (_n a ) 3/2 
(8.4) lim >) E | Vin (X,) | \ 2 E{Wi(X,)} ie 0. 


no y=l \ 


Then, asn — ©, the d.f. of (U — E{U})/o(U) tends to the normal d.f. with mean 
0 and variance 1. 

The proof is similar to that of Theorem 7.1. 

Let 


W = "DY Ho (X). 
N yv=l 





A CLASS OF STATISTICS 
It will be shown that 
(a) the d.f. of 
W — E{w} 
o(W) 


tends to the normal d.f. with mean 0 and variance 1, and that 
(b) the d.f. of 


V = 


U— E{U 
a(U) 
tends to the same limit as the df. of V. 
Part (a) follows immediately from (8.3) and (8.4) by Liapounoff’s form of the 
Central Limit Theorem. 
According to Lemma 7.1, (b) will be proved when it is shown that 
o(U, W) 


lim E{V’ — V}? = tim {2 — a =0 
o(U Jo | 


V' = 


or 
o(U, W) 
Ot 3, 
ae o(U)o(W) 


Let c be an integer, 1 < c < m, and write 


(8.5) 


x= (m,--+ , 2), y = (y1,-°* 5 Ym), Z= (41, °** 5 Sm—e) 
F (a)(x) aay Fa, (1) ea F..(z0, Fis)(y) — F3,(y) | Fg. te~ads 
F (2) = Fy, (a) ae PF enc lamned 


Then, by Schwarz’s inequality, 


[ i | &(x, y)®(x, z) dF.) (x) dF) (y) dF (2) 


< {fo [ Gu) aoe) aro) 


4 
| oe | #'(x, z) dF) (x) aPen(2)} ’ 


which, by (8.2), is < A for any set of subscripts. 
By the inequality for moments, @a,,...,2, , aS defined by (5.14), is also uni- 

formly bounded, and applying these inequalities to (5.16), it follows that there 

exists a number B such that 

(8.6) Diehl oii Vie s¥m—c | < B, (c _ 1, ves ae m), 

for every set of subscripts satisfying the inequalities 

Ay ~ a, By ~ Br, Yo ~ Vn ifg ~h, a; ~ B;, a FYI; 


“Cj = l,c°+,m=— €). 











312 WASSILY HOEFFDING 


Now, we have 
E{w} =0 
and 


2n 
(8.7) o(W) = “DEH (X,)} 
or, inserting (8.1) and recalling (5.16), 
?>(n-1\*< ) 
(8.8) a (W) = (” ae ) 7 b By >» Fateen.-**ttig~ 03 Bac**Ba~t ’ 


m v=l (%v) (¥v) 


the two sums 2’ being over a1 < +++ < ami, (a: ¥ v), and Bi < +++ < Bu, 
(8; * v), respectively. By (5.17), the sum of the terms whose subscripts 
V, 01, °°" » Gm1, 61, °°: , Bm are all different is equal to 


n(n — 1) --- (n — 2m + 2) 7 i lle 
~ (m—i1)m—-1D! fan = 11 — 1m — 1) 5° 


The number of the remaining terms is of order n°™*. Since, by (8.6), they are 
uniformly bounded, we have 


2 
(8.9) o(W) = ~ Sin + O(n). 


Similarly, we have from (5.18) 


a (U) _ 7 Cin + O(n™), 


and hence 
(8.10) o(U) = o(W) + O(n’). 
The covariance of U and W is 


n 
m 


—l n 
(8.11) o(U, W) = ( ) - Do Et Fre (Xe)¥mcer--iem (Xan s+» Xag): 


All terms except those in which one of the a’s = »v, vanish, and for the re- 
maining ones we have, for fixed a1 , +--+ ,Qm, 


E{ Vii») (X1)Vincery,-+-sam)(Xay eo ae X am) } 


n mt tH . P 
- ( ) Zz Ef W100) 3,5-++s Bay (Xr) Vrryy.---erma1 A») } 


m— 1 


(Av) 
‘ _ ) Ye 
= G10) 816°+ + Bmand V1 em 
.m oe 1 (Fr) 1 e¥m~1 1 m=~1 
where the summation sign refers to the 6’s, and y1, °°: , Ym-1 are the a’s that 


are ~ v._ Inserting this in (8.11) and comparing the result with (8.8), we see that 
(8.12) o(U, W) = o (W). 





A CLASS OF STATISTICS 313 


From (8.12) and (8.10) we have 


o(U,W) _o(W)_ __no(W) 
na(W) + O(1)* 


Comparing condition (8.4) with (8.7), we see that we must have no(W) — © 
asn— ©. ‘Thisshows the truth of (8.5). The proof of Theorem 8.1 is complete. 

For some purposes the following corollary of Theorem 8.1 will be useful, where 
the conditions (8.2), (8.3), and (8.4) are replaced by other conditions which are 
more restrictive, but easier to apply. 

THEOREM 8.2. Theorem 8.1 holds if the conditions (8.2), (8.3), and (8.4) are 
replaced by the following: 
There exist two positive numbers C, D such that 


o(U)o(W) ao (U) 


(8.13) | ‘on | | b°(a1, +++ 5 tm) | dF a,(41) *+* dFag(%m) < C 


fora; = 1,2,---,(@@ = 1,-++,m), and 

(8.14) Sir ers-seem—13 Biss Bmaa > D 

for any subscripts satisfying 

loa <ae<++s Cam, 1< Bi <Bo<ee* < Buu, l<v¥#a,B;. 


We have to show that (8.2), (8.3), and (8.4) follow from (8.13) and (8.14). 

(8.13) implies (8.2) by the inequality for moments. By a reasoning analogous 
to that used in the previous proof, applying Hélder’s inequality instead of 
Schwarz’s inequality, it follows from (8.13) that 


(8.15) E | ¥i)(X,) | < C’ 
On the other hand, by (8.7), (8.8), and (8.14), 


(8.16) Y Bi Ha(X)} > nD. 


(8.15) and (8.16) are sufficient for the fulfillment of (8.4). 


9. Applications to particular statistics. 

(a) Moments and functions of moments. It has been seen in section 4 that the 
k-statistics and the unbiased estimates of moments are U-statistics, while the 
sample moments are regular functionals of the sample d.f. By Theorems 7.1, 
8.1, and 7.4 these statistics are asymptotically normally distributed, and by 
Theorem 7.5 the same is true for a function of moments, if the respective condi- 
tions are satisfied. These results are not new (cf., for example, Cramér [2]). 

(b) Mean difference and coefficient of concentration. If Y,,--- , Yn are n in- 
dependent real-valued random variables, Gini’s mean difference (without repeti- 
tion) is defined by 

i tesa D |Ya — Yale 


n(n — 1) a 

























314 WASSILY HOEFFDING 


If the Y.’s have the same distribution F, the mean of d is a 


7” [| ly — yo| dF (y) dF (yp), 


( 
and the variance, by (5.13) is 
2 
o'(d) = ——~—_. {2&(8)(n — 2) + &(5)}, 
n(n — 1) 
where : 
, i 
(9.1) i) = f ‘ [ilu - wl ary) dF (1) — 8, 
( 


(922) 0) = | [ @ — w)* ary) ary) — # = 20°(¥) — &. 


The notation £,(6), f(6) serves to indicate the relation of these functionals of 
F to the functional 6(/); 6 is here merely the symbol of the functional, not a par- 
ticular value of it. In a similar way we shall write ®(y1 , y2 | 6) = | y1 — ye, 
etc. When there is danger of confusing ¢,(6) with ¢:(/), we may write 6(F | 8). 

U.S. Nair [19] has evaluated o°(d) for several particular distributions. 

By Theorem 7.1, ~/n(d — 8) is asymptotically normal if ¢5(6) exists. 

If Y:,--- , Y, do not assume negative values, the coefficient of concentration 


(cf. Gini [8]) is defined by 





d 
G = op» 
where Y = SY./n. G isa function of two U-statistics. If the Y.’s are identi- 


cally distributed, if H{ Y~} exists, and if u = E{Y} > 0, then, by Theorem 7.5, 
a/n (G — 6/2) tends to be normally distributed with mean 0 and variance 


es, 5, . 
ge Sulu) — © Salo, 6) + -, (6), 
4 b ue 


where 


nu) = | fara) - f= oY), 


1(u, 6) = | | yi ly: — y2| dF(y:) dF(y2) — ws, 


and £1(6) is given by (9.1). 
(c) Functions of ranks and of the signs of variate differences. Let s(u) be the 
signum function, 


—lifu <0; 
(9.3) su) = Oifu=0; 
lifu> 0, 





On 


ti- 


or 


ne 


A CLASS OF STATISTICS 315 


and let 
Oifu <0; 
04) c(u) = {1 + s(u)} = Fifu =0; 
lifu>o0. 
If 
La = (Ce , =), (a = 1, og ,n) 


is a sample of n vectors of r components, we may define the rank R{” of x? by 


RY = 3+ a c(a — xf”) 
(9.5) : 





= BET +E Dd oP — of), G= 1-50. 


i) i i) “en 
If the numbers xj”, x3”, ---, v\” are all different, the smallest of them 


has rank 1, the next smallest rank 2, etc. If some of them are equal, the rank 
as defined by (9.5) is known as the mid-rank. 

Any function of the ranks is a function of expressions c(x$” — x§”) or 
s(x,” — ag”). 

Conversely, since 


s(x” — ag”) = s(RY — Rg”), 


any function of expressions s(2$? — 2g”) or c(aQ? — ap”) is a function of the 
ranks. 

Consider a regular functional @(F) whose kernel ®(21 , --- , 2m) depends only 
on the signs of the variate differences, 


(3) (4) a ieee 
(9.6) S(tq — 2°); (a, 8B = 1,°-+>,m;t =1,---,7r). 
The corresponding U-statistic is a function of the ranks of the sample variates. 
The function ® can take only a finite number of values, c1,--- , cy , say. If 
a = P{d =c,}, @ = 1,--- , N), we have 
N 
6= Cm + -:++ + enty, Ym; = 1. 
t=1 
m; is a regular functional whose kernel ,(7; , --- , 2m) is equal to 1 or 0 aceord- 


ing to whether @ = c; or # c;. We have 
@ = CyP, + 2 + CrPy ° 
In order that @(/’) exist, the c; must be finite, and hence ® is bounded. There- 
fore, E{@"} ‘exists, and if 'X,, X2, -++ are identically distributed, the d.f. of 


Vn(U — @) tends, by Theorem 7.1, to a normal d.f. which is non-singular if 
i > 0. 


In the following we shall consider several examples of such functionals. 








316 WASSILY HOEFFDING 





(d) Difference sign correlation. Consider the bivariate sample 


(9.7) (xf, af), (2, 28), «++, 2D, 2). 


To each two members of this sample corresponds a pair of signs of the differ. 
ences of the respective variables, 










(9.8) s(xQ — xf”), s(e&& — 25), (a # B3a,8 = 1,--:,n), 
(9.8) is a population of n(n — 1) pairs of difference signs. Since 
Y a2 — of) = 0, (i = 1,2), 
aAB 


the covariance ¢t of the difference signs (9.8) is 





= t= DX s(e? — af)a(eP — 29”). 


eat — 1) om 

















i t will be briefly referred to as the difference sign covariance of the sample (9.7). 
If all xs and all z®’s are different, we have 


DD F(x? — ag”) = n(n — 1), (¢ = 1,2), 


a¥B 





and then ¢ is the product moment correlation of the difference signs. 
It is easily seen that ¢ is a linear function of the number of inversions in the 
° (1) (2) 
permutation of the ranks of x” and x 
The statistic t has been considered by Esscher [6], Lindeberg [15], [16], Kendall 
[12], and others. 
tis a U-statistic. As a function of a random sample from a bivariate popula- 
tion, ¢ is an unbiased estimate of the regular functional of degree 2, 


(9.10) “- | | | | s(2 — 2)s(2® — 2) dF (21) dF (x). 


z is the covariance of the signs of differences of the corresponding components 
of X, = (X{”, X{”) and X. = (X2”, X§) in the population of pairs of inde- 
pendent vectors X, , X2 with identical d.f. F(x) = F(ice®, 2). If F(x, 2) 
is continuous, 7 is the product moment correlation of the difference signs. 

Two points (or vectors), (a{”, a{”) and (x$”, x3”) are called concordant or 
discordant according to whether 


(1) af? 2) 7) 
= )(ar Mi — 2X2) 
is positive or negative. If x‘ and x are the probabilities that a pair of vectors 


drawn at random from the population is concordant or discordant, respectively, 
we have from (9.10) 





(c) (d) 
i. « 


T=T 






(2)\ + : (c) d 
If F(x”, x) is continuous, we have 7“? + 7 = 1, and hence 
d 
(9.11) r= 2? —1 = 1 — 2n™, 










A CLASS OF STATISTICS 


If we put 
Fe, 2) = {F@® — 0,2 — 0) + Fe” — 0,2” + 0) 
9.12 
wag + F@™ + 0,2 — 0) + F(e™ + 0, 2 + 0)}, 
we have 
(9.13) &(x|7) = 1 — 2F(e™, ©) — 2F(o, c) + 4P(e, 2), 
and we may write 


(9.14) 7 = E{%,(X1| 7)}. 


The variance of ¢ is, by (5.13), 


‘ 2 2 . ‘ 
(9.15) o (t) = nn — 1) {2¢,(7)(n = 2) + f2(7)}, 


where 
(9.16) (7) = E{#i(X1 | 7)} — 7’, 
(9.17) Solr) = E{s'(Xq? — Xy?)s'(XP — XP)} - 7. 


If F(x, «) is continuous, we have f(r) = 1 — 7’, and F(x, x) in (9.13) 
may be replaced by F(x", x’). 

The variance of a linear function of ¢ has been given for the continuous case by 
Lindeberg [15], [16]. 

If X® and X® are independent and have a continuous d.f., we find {:(7r) = 3, 
f(r) = 1, and hence 


; 2(2 
(9.18) a(t) = = nee 


In this case the distribution of / is independent of the univariate distributions 
of X® and X®. This is, however, no longer true if the independent variables 
are discontinuous. Then it appears that o°(t) depends on P{X{” = X3"} 
and P{X{” = X$° = X3°}, (¢ = 1, 2). 

By Theorem 7.1, the d.f. of ~/n(t — 7) tends to the normal form. This result 
has first been obtained for the particular case that all permutations of the ranks 
of X® and X® are equally probable, which corresponds to the independence 
of the continuous random variables X, X® (Kendall [12]). In this case ¢ can 
be represented as a sum of independent random variables (cf. Dantzig [5] and 
Feller [7]). In the general case the asymptotic normality of ¢ has been shown 
by Daniels and Kendall [4] and the author [10]. 

The functional 7(/’) is stationary (and hence the normal limiting distribution 
of ~/n(t — r) singular) if ¢; = 0, which, in the case of a continuous F, means that 
the equation #,(X | 7) = 7 or 


(9.19) 4F(X, X™) = 2F(X®, ©) + 2F(o, X”) —14+ 7 





318 WASSILY HOEFFDING 


is satisfied with probability 1. This is the case if X® is an increasing function 
of X”. Thent = r = 1 with probability 1, and o’(t) = 0. A case where (9.19) 
is fulfilled and o°(t) > 0 is the following: X is uniformly distributed in the 
interval (0, 1), and 

(9.20) xX? =xX%+2if0 < X® <3, X% =X -—4if8 < X% <1 


In this case r = 0, f2 = 1, 0° (t) = 2/n(n — 1). 
(e) Rank correlation and grade correlation. If in the sample {(x{’, 2 


a) Se \, 
1 2 
(a = 1,---, 7), all 2{’s and all x{’s are different, the rank aa. co- 


efficient, wtih we denote by k’, is given by 
12 — 1 n+1 2 n+ 1 
k’ = eS « =. (re a cor Bp 
ni—n 2d ( 2 2 
Inserting (9.5) we have 
ki = —— s(x? — ap?)s(eY — x?) 
n 3 — a=1 B=1 y= 
or 
(n - — 2k + 3t 
n+l 
where ¢ is the difference sign covariance (9.9), and 


3 ” (1) (1) (2) (2 
oo —_— 8 M4 = Zz Ss = — V 
n(n 1)(n 2) 7 ( a 8B ) ( a ? 


(9. 21) k’ = 


the summation being over all different subscripts a, 8, 7 
k is a U-statistic, and as a function of a random sample from a population with 
d.f. F, k is an unbiased estimate of the regular functional of degree 3, 


ay cw — x$?)s(x? — 2) dF (a1) dF (x2) dF (x3) 
(9.22) 


«3 / | (QF) — 1) {2F@(2) — 1) dF (2), 


where F(z) = F(x, 0), P(e) = F(o, 2). 
If F is continuous, we have 


y 1 
| F (y) dF (y) = [ u du 
0 


1 
| {F@(y) — 4 dF@(u) = I (u—3)/du=7;, (= 1,2), 


and in this case «x is the coefficient of correlation between the random variables 


yo ‘ia F@(x™), yu = FM x), 





A CLASS OF STATISTICS 319 


U has been termed the grade of the continuous variable X“”, and in the general 
ease F‘?(X) may be called the grade of X™ (cf., for instance, G. U. Yule and 
M. G. Kendall [22, p. 150]). In general, x is 12 times the covariance of the 
grades. 

From (9.21) we have for the expected value of k’, 


(n — 2)« + 3r 


Se « SS 
ye 5 n+1 


In the continuous case the rank correlation coefficient k’ is an estimate of the 


grade correlation x, which is biased for finite n but unbiased in the limit. 


1 (1)) (2 2) . ° e 
The kernel 3s(x{” — a3”)s(a}” — 2{°) of x is not symmetric. Denoting by 
| 


®(2; , 2 , X3 | x) the symmetric kernel of x, we have 


1,2,3 
] 


Z dD s(xP — af?) s(x — x9) 
aXxBAy 
aA 


(9.23) P(x1 , X2, 2s | “) = 


For computing «x and the constants ¢; an alternative expression for x and ® is 
sometimes more convenient. From three two-dimensional vectors 21, 22, 23 
we can form three pairs (x1, #2), (%1, v3), and (x2, 23). The number of con- 
cordant pairs among them can be 3, 2,1, or 0. If y is the probability that among 
the three pairs formed from three random elements of the population at least 2 
are concordant, we have, if the d.f. F is continuous, 


(9.24) k= 27-1. 


This is analogous to the expression (9.11) for r. 
The truth of (9.24) can be seen as follows: From the definition of y we have 


= E{P(ay » U2, Xs ly)}, 


where (21 , 22 , 23 | y) is = 1 if at least two of the three expressions 


(9.25) (wa — ag)(xa — a5), (a < B; a, 6 = 1, 2,3) 


are positive, and equal to zero, if no more than one of them is positive. Since, 
by the continuity of F, we may neglect the case of (9.25) being zero, we may 
write 


P(x, , Xo, Xs | ¥) = Cr2,12C23,23Cs1,31 + C12,12C23,23Cs1,13 + C12,12C23,32C31,31 + C12,21€23,23C31,31 » 
where 
Ca,8,7,8 = e[(a? = xp’ )(2” _ x3”)] 
and c(u) is defined by (9.4). 
P(x, , 22, x3 | vy) is symmetric in 2% , 22, 23. 
The identity 


(9.26) B(x, , X2, 13 | x) = 2x1, t2,23|y) — 1 





320 WASSILY HOEFFDING 


can be shown to hold either by algebraical calculation using (9.4) or by direct 
computation of each side for the different positions of the three points 2; , x2 , 23. 
From (9.26) it appears that in the continuous case the symmetric kernel 
(x; , X2 , X3 | x) can assume only two values, —1 and +1. 
The variance of k is, according to (5.13), 


2 _ 6 a = 3 : _ 
o (k) = n(n — 1)(n — 2) {3/ 2 ) i) + 3(n 3) f(x) + rte), 


where 
Si(k) = E{@(Xi|«)} — «, 
$o(x) = E{3(X1, X2|«)} — &, 
ba(x) = E{@’(X,, Xo, Xs|«)} — &’, 
P(x | x) = Ef{ P(x, Xe, X3|«)}, 
Bo(x, , 12 |x) = Ef{ (a1, x2, X3| «)}. 
We find for the continuous case 
x(k) =1—-*, 
(9.27) (x1 |«) = [1 — 2F(x}”, ©)][1 — 2F(«, 2y”)] — 2F(zj”, ~) 


= 2 (20, 2) + 4[ FO, y®)aF(@, y) 


+ 4[ Fy, 2dr y, @), 


B(x, , te |x) = 1 + 2F(xh”, 2) + 2F(x3”, af?) — 2c(xs? — 2{”)F(a{?, ») 
—2c(x{” _ ars”) F (x$”, 0) — 2c(xs” = a?) F(, a) 
— 2c/x{? —x §)F(@, x$”). 


1 2) : . ° 
If X®, X are continuous and independent, we obtain « = 0, ¢: = 4, f = Zs, 
¢é; = 1, and hence 


ial a steele lt Minis 
(9.28) o (k). = n(n a 1)(n _ 2) ° 

In the discontinuous case of independence the distribution of k, as that of t, 
depends on the distributions of _ and X “ and o'(k) can again be expressed 
in terms of P{X;" = X$?} and P{X{” = X$” = X$"}, (¢ = 1, 2). 

The variance of the rank correlation coefficient k’ is, by (9.21), 


an _ (n — 2)? or(k) + 6(n — 2)o(t, k) + 9o°(t) 
(9.29) o (k’) = SS —— +1? ‘ 





A CLASS OF STATISTICS 


For a(t, k) we have, according to (6.5), 


6 
a(t, k) = na-o {(n — 3)gi(r, «) + S2(7, «)}, 


where 

Sa(r, x) = El O(X1 | 7)Oi1(Xi | «)} — re, 

fo(7, x) = E{@(X,, Xe | r)bo(X1, Xo] «)} — rk. 

In the case of independence we see from (9.13) and (9.27) that 
(x |r) = &(e |x) = [1 — 2F(@e, &)][1 — 2F(o, «™)), 
and we obtain 
(9.30) Si(r, «) = Si(k) = $(7) = 5, 
fo(7, x) = §, 
2(n + 2) 
3n(n — 1)° 
On inserting (9.28), (9.31) and (9.18) in (9.29), we find 
1 


ES a 
e& n—1’ 


(9.31) o(t,k) = 


in accordance with the result obtained for this case by Student and published 
by K. Pearson [20]. 

According to Theorem 7.1, +/n(k — «) tends to be normally distributed with 
mean 0 and variance 9f;(x). The same is true for the distribution of the rank 
correlation coefficient, k’, as follows from Theorem 7.3 in conjunction with 
(9.21). For the special case of independence the asymptotic normality of k’ 
has been proved by Hotelling and Pabst [11]. 

From Theorem 7.3 it also follows that the joint distribution of ~/n(t — 7) 
and ~/n(k — x) (or ~n(k’ — x)) tends to the normal form with the variances 
4¢:(7) and 9¢,(x) and the covariance 6f:(x, 7). In the case of independence we 
see from (9.30) that the correlation p(t, k) between ¢ and k tends to 1, and we have 
the asymptotic functional relation 3t = 2k. This result has been conjectured by 
Kendall and others [14], and proved by Daniels [3]. In general, however, p(t, k) 
does not approach unity. Thus, if X is uniformly distributed in (0, 1), and 


x? _ 1 a. x” if 0 < x” < 1 

x? a 1 + x® if 2 < x® < i, 

(9.32) X™ aw X¥™~—4 3 <¢ X® <i, 
xX? =g3—-X if? < X% <1, 


we have 7 = «x = 0, f(r) = 0, f2(7) = 1, Si(k) = He, Silk, 7) = 0, and hence ) 
p(t, k) + 0. 






















































































































































322 WASSILY HOEFFDING 






(f) Non-parametric tests of independence. Suppose that the random variables 
X®, X® have a continuous joint df. F(z, x), and we want to test the 


hypothesis Hy that X“ and X are independent, that is, that 
F(z, 2) = F(z, 0) F(, x). 


The distribution of any statistic involving only the ranks of the variables 
does not depend on the d.f. of the population when H) is true. For this reason 
several rank order statistics, among them the difference sign correlation ¢ and 
the rank correlation k’, have been suggested for testing independence. 

From the preceding results we can obtain the asymptotic power functions of 
the tests of independence based on tandk’. If Hy is true, we have E{t} = 7 =0, 
and the critical region of size ¢ of the ¢-test may be defined by | ¢| > cn, where 
c, is the smallest number satisfying the inequality 


(9.33) P{|t|>cn| Ho} <e. 


By Theorem 7.2 and (9.18) we may write c, = 2\,/3./n, where , tends to a 
positive constant \ depending on e. 
Since o°(t) = O(n"), the power function 


P,(H) = P{|t| > 2\./3Vn | H} 


tends to one as n — o for any alternative hypothesis H with 7r(F) ¥ 0. If, 
however, 7 = 0, we have lim P,(H) < 1. If7r = Oand f(r) < }, we have even 
lim P,(H) < e, and with respect to these alternatives the test is biased in the 
limit. Thus, in the case of the distribution (9.20) we have even P,(H) — 0. 
In this case there is a functional relationship between the variables, and the 
distribution must be considered as considerably different from the case of in- 
dependence. 

For the rank correlation test we have a similar result. If cy, is the smallest 
number satisfying P{ | k’ | > cr | Ho} < «€, we have c, = An/+/n, where 
lim \}, = X, and the test is biased in the limit if « = 0 and {(x) < 3. This is ful- 
filled in the case of the distribution (9.32), where fi(k) = y¢. 

The question arises whether there exist non-parametric tests of independence 
which are unbiased or unbiased in the limit. This point will be discussed in a 
separate paper on tests of independence. 


(g) Mann’s test against trend. Let Y;,--- , Yn be n independent real-valued 
random variables, Y. having the continuous df. F.(y), (a = 1,°::, n). 
The hypothesis of randomness, . 

Hy: = Fiy) = --- = Fy) 


is to be tested against the alternative hypothesis of a “downward trend,” 
Hz: Fi(y) < F3(y) ._ ++ a F,.(y). 
H. B. Mann [17] has suggested a test of H; against H2 based on the number T 


of inequalities Y, < Ys, wherea < 8. We may write 


oT — n(n — 1) _ > (Ys — Ya) = > sla — B)s(Ya — Y5). 


2 a<§ a<3s 











A CLASS OF STATISTICS 
The U-statistic 
t= {4T/n(n — 1)} - 1 


is the same as (9.9) for the special case when one component is not a random 
variable. 
Let 


= s(a— 8) ff sy — 2) dF au) aFs(ee) 


s(a — a{ 2 [ Fa dF .(y) — a3 


We have tas = 0 if Hi is true and tas < 0 if He is true. 
Since 


i 2 
n(n — 1) ace 


it follows that E{t} = 0 under H; and E{t} < 0 under Hp. 

Mann’s test against trend has the power function P,(H) = P{t < a,|H}, 
where a, is the largest number satisfying P{t < a, | Hi} < «. 

Since a, — 0 and, by (5.18), o(t) = O(n"), it follows from Tchebycheff’s 
inequality that the test is consistent (that is, P,(H2) — 1) and hence unbiased 
in the limit. This has been shown by Mann who also gave sufficient conditions 
under which the test is unbiased for finite n. 

By Theorems 8.1 and 8.2 the distribution of (¢ — 7,)/o(t) is asymptotically 
normal if certain conditions are satisfied. Since (8.2), (8.3) and (8.13) are ful- 
filled, either of the conditions (8.4) and (8.14) is sufficient. 

(h) The coefficient of partial difference sign correlation. Consider a three- 
variate sample 21, °:* , tn; ta = (xo, 29, 2), (2 = 1,-::,n). Inasim- 
ilar way as in section 9d we may form the set of the n(n — 1) triplets of differ- 
ence signs, 


Tas » 


2 2), 3 3 
(9.34) s(xQ — xf”), s(t — 29), s(x — 29”), 


(a ~= B;a,B =1,--:,n). 


We shall assume that all x“’s, x®’s, and x®’s are different. Then the triplets 


(9.34) contain only two different numbers, +1 and —1. Hence the regression 
functions of the three-variate population (9.34) are linear. 

If ti, t3, and ts; are the difference sign correlations of {s(x$? — 2§”), 
s(x — ap”)}, {sw — ap”), s(x — xp”)} and {s(x — 25”), s(e2? — 2§”)} 
respectively, we have for the coefficient t».3 of partial correlation between 


1 ) 2 . 3 3 
s(x — x§?) and s(x — xf) with respect to s(xQ? — x§”), 


tio — tis tes 
9.35 tio = SSS 
—_ ee VS — 8)(1 — Bs) 





324 WASSILY HOEFFDING 


This measure of partial correlation has been suggested by Kendall [13] who 
gave an alternative definition of ty, . 

If we have two _ independent three-dimensional random vectors 
> 7(1 2 7(3 r 7(1) 7(2 (3) 
xX; = (Xi, 4 - 7 ©) and X_. = (X$”, XS, X%°) with the same continuous 
df. F(ia™, 2' ), the distribution of the difference signs s(Xj° — 2 z 
(¢ = 1, 2, 3), le again linear regression functions, and we may define the 
partial difference sign correlation 


13 _ _T18 723 


~ VO os 733) ( a — 73s) : 


2s = 


where 7;; is the difference sign correlation of X°, X. 

If ti2.3 is a function of a random sample, and if 73; ¥ 1, 723 ¥ 1, the df. of 
/n(tw.3 — T2.3) tends, by Theorem 7.5, so the normal d.f. with mean zero and 
variance 


ao * > — uae 
it r3,)(1 - ) | $1(t12) + (1 rs * tara) 


2 
— T1793)? tn = "5 — = T23 


a &1(723) ak ame Geena * ti(ne, 713) = 2° “ae 1(712 , 793) 
= 23) l— 713 "1 = 733 


(T23 — 12 713) (Ti3_ = T12 T23) 


+2- a + 2,)(1 — 7) o1(713 , nS, 


§(r5j) =E {81(X | 73)} — ris, 
Si(rij » Ton) = BELO (X | 7453)B(X | t0n)} — ris t0n, 
and, for instance (cf. (9.13)), 
(x | t2) = 1 — 2F(x", ©, 0) — 2F(w, 2”, ©) + 4F(2™, 2, &). 
If 713 = 723 = 0, we have 
oi23 = 4f1(712), 


and /n(tz.3 — 712.3) has the same limiting distribution as Wn(t — 12). This is 
° ° r 7 (2) 7 (3 : 
in particular the case when X"’, X®, X are independent. 


- REFERENCES 


{1] H. Cramtr, Random Variables and Probability Distributions, Cambridge Tracts in 
Math., Cambridge, 1937. 

[2] H. Cram&r, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[3] H. E. Dantets, ‘‘The relation between measures of correlation in the universe of sam- 
ple permutations,’’ Biometrika, Vol. 33 (1944), pp. 129-135. 

[4] H. E. DanrELs AND M. G. KENDALL, ‘‘The significance of rank correlations where 
parental correlation exists,’’ Biometrika, Vol. 34 (1947), pp. 197-208. 

[5] G. B. Danrzia, ‘“‘On a class of distributions that approach the normal distribution 
function,’’ Annals of Math. Stat., Vol. 10 (1939) pp. 247-253. 





A CLASS OF STATISTICS 325 


(6) F. EsscHER, ‘‘On a method of determining correlation from the ranks of the variates,”’ 
Skandinavisk Aktuar. tids., Vol. 7 (1924), pp. 201-219. 

(7] W. FeLLer, ‘*The fundamental limit theorems in probability,’? Am. Math. Soc. Bull., 
Vol. 51 (1945), pp. 800-832. 

[8] C. GINI, ‘‘Sulla misura della concentrazione e della variabilita dei caratteri,’’ Atti del 
R. Istituto Veneto di S.L.A., Vol. 73 (1913-14), Part 2. 

(9] P. R. Hatmos, ‘‘The theory of unbiased estimation,’’ Annals of Math. Stat., Vol. 17 
(1946), pp. 34-43. 

(10) W. Hérrprne, ‘‘On the distribution of the rank correlation coefficient r, when the 
variates are not independent,’’ Biometrika, Vol. 34 (1947), pp. 183-196. 

(11) H. HoreLtinc anv M. R. Passt, ‘‘Rank correlation and tests of significance involving 
no assumptions of normality,’’? Annals of Math. Stat., Vol. 7 (1936), pp. 29-43. 

12) M. G. Kenpa.t, ‘‘A new measure of rank correlation,’’ Biometrika, Vol. 30 (1938), 
pp. 81-93. 

(13] M. G. Kenpa tt, ‘‘Partial rank correlation,’ Biometrika, Vol. 32 (1942), pp. 277-283. 

14] M. G. Kenpatt, 8. F. H. KENDALL, AND B. BaBinGTon Smith, ‘The distribution of 
Spearman’s coefficient of rank correlation in a universe in which all rankings 
occur an equal number of times,’’ Biometrika, Vol. 30 (1939), pp. 251-273. 

15] J. W. LinpEBERG, ‘‘Uber die Korrelation,’? VI Skand. Matematikerkongres i K¢ben- 
havn, 1925, pp. 437-446. 

16] J. W. LiInDEBERG, ‘‘Some remarks on the mean error of the percentage of correlation,”’ 
Nordic Statistical Journal, Vol. 1 (1929), pp. 137-141. 

[17] H. B. Mann, ‘‘Nonparametric tests against trend,’’ Econometrica, Vol. 13 (1945), 
pp. 245-259. 

[18] R. v. Misgs, ‘‘On the asymptotic distribution of differentiable statistical functions,”’ 
Annals of Math. Stat., Vol. 18 (1947), pp. 309-348. 

[19] U. S. Narr, ‘‘The standard error of Gini’s mean difference,’’ Biometrika, Vol. 28 (1936), 
428-436. 

[20] K. Pearson, ‘‘On further methods of determining correlation,’’ Drapers’ Company 
Research Memoirs, Biometric Series, IV, London, 1907. 

[21] V. Votrerra, Theory of Functionals, Blackie, (authorized translation by Miss M. 
Long), London and Glasgow, 1931. 

[22] G. U. Yue anp M. G. Kenpbatu, An Introduction to the Theory of Statistics, Griffin, 
11th Edition, London, 1937. 








OPTIMUM CHARACTER OF THE SEQUENTIAL PROBABILITY RATIO 
TEST 


A. Wap AnD J. Wo.rowi1tTz 


Columbia University 


1. Summary. Let S) be any sequential probability ratio test for deciding 
between two simple alternatives Hj) and H,, and S, another test for the same 
purpose. We define (7, 7 = 0, 1): 


a,(S;) = probability, under S; , of rejecting H; when it is true; 


Ei (n) = expected number of observations to reach a decision under test S; 
when the hypothesis H;is true. (It is assumed that Ej (n) exists.) 


In this paper it is proved that, if 
ai(Si) < ai(So) 
it follows that 
Ei (n) < Ej (n) (@ = 0,1). 


This means that of all tests with the same power the sequential probability ratio 
test requires on the average fewest observations. This result had been con- 
jectured earlier ({1], [2]). 


2. Introduction. Let p;(x), 7 = 0, 1, denote two different probability density 
functions or (discrete) probability functions. (Throughout this paper the index 
z will always take the values 0, 1). Let X be a chance variable whose distribu- 
tion can only be either po(x) or p:(x), but is otherwise unknown. It is required 
to decide between the hypotheses Hy , H; , where H; states that p;(x) is the dis- 
tribution of X, on the basis of n independent observations 21, --- , 2, on X, 
where n is a chance variable defined (finite) on almost every infinite sequence 


@ = %1,%2,°°° 


i.e., 2 is finite with probability one according to both po(x) and pi(x). The 
definition of n(w) together with the rule for deciding on Ho or H, constitute a 
sequential test. 

A sequential probability ratio test is defined with the aid of two positive 
numbers, A* > 1, B* < 1, as follows: Write for brevity 


7 
Pi = I] pi(xr). 


Then n = 7 if 


Pu > 4* 
Poj 





SEQUENTIAL PROBABILITY RATIO TEST 


BY <7 < At, k <j. 
0k 


= > a, the hypothesis H; is accepted, 
On 


: : < B* the hypothesis Hp is accepted. 

In this paper we limit consideration to sequential tests for which E;(n) exists, 
where E;(n) is the expected value of n when H; is true (i.e., when p;(z) is the dis- 
tribution of X). It has been proved in [3] that all sequential probability ratio 
tests belong to this class. The purpose of the paper is to prove the result stated 
in the first section. Throughout the proof we shall find it convenient to 
assume that there is an a priori probability g; that H; is true (go + gi = 1; we 
shall write g = (go, g:)). We are aware of the fact that many statisticians 
believe that in most problems of practical importance either no a priori pro- 
bability distribution exists, or that even where it exists the statistical decision 
must be made in ignorance of it; in fact we share this view. Our introduction 
of the a priori probability distribution is a purely technical device for achieving 
the proof which has no bearing on statistical methodology, and the reader will 
verify that this is so. We shall always assume below that g ¥ 0, 1. 

Let Wo, Wi, c be given positive numbers. We define 


R = go(Woa + cE(n)) + gi(Wiai + cE,(n)), 


and call R the average risk associated with a test S and a given g (obviously R 
is a function of both). We shall say that H; is accepted when the decision is 
made that p;(x) is the distribution of X. We shall say that Hp is rejected when 
H, is accepted, and vice versa. The reader may find it helpful to regard W; 
as a weight which measures the loss caused by rejecting H; when it is true, c as 
the cost of a single observation, and FR as the average loss associated with a given 
g and a test S. For mathematical purposes these are simply quantities which 
we manipulate in the course of the proof. 


3. Role of the probability ratio. Let g, W = (Wo, W:), and c be fixed. Let 
S be a given sequential test, with R(S) the associated risk and n(w, S) the as- 
sociated ‘“‘sample size” function. Let ¥(a1,--: ,2n) be the “decision” function; 
this is a function which takes only the values 0 and 1, and such that, 
when 21, «++ , Zn is the sample point, the hypothesis with index ¥(a, --- , Xn) is 
rejected. Define the following decision function g(a, --- , 21): ¢ = 0 when 


as Wig Pin 
Wo Jo Pon 











328 A. WALD AND J. WOLFOWITZ 


is greater than 1, and ¢ = 1 when X < 1. When A = 1, ¢o may be 0 
or 1 at pleasure. 

It must be remembered that an actual decision function is a single-valued fune- 
tion of (%1, °°: ,2%n). We note, however, that 


a) the relevant properties of a test are not affected by changing the test on 4 
set T of points w whose probability is zero according to both Hy and Hy, ic, 
changing the definition on T of n and/or of the decision function, leaves a, 
a, , Eo(n) and E,(”) unaltered. In particular, the average risk R remains un- 
changed. 


b) the set of points for which pon = pin = 0 and X is indeterminate, has prob- 
ability zero according to both Hp and H,. 

In view of the above we decide arbitrarily, in all sequential tests which we 
shall henceforth consider, to define n = j, and y = 0, whenever po; = 71; = 0, 
and n ¥ 1,--:,(j — 1). By this arbitrary action R(S) will not be changed. 


Let now 
Le. = _Wigipin 
JoPon + 91 Pin’ 
Ly, = en + min (Lon, Lin). 
We have 


ELyn = 2giW ia; 


where the operator E denotes the expected value with respect to the joint dis- 
tribution of H; and (x, --- , tn), i.e., E is the operator gE) + giE,. If now 
the event {¥(S) # ¢ and \ ¥ 1} has positive probability according to either 
Hy or H, , we would have, for n = n(w, S), 


a, < Me. 


Hence, if the decision function Y connected with the test S were replaced by the 
decision function ¢, R would be decreased. Since our object throughout this 
proof will be to make FR as small as possible, we shall confine ourselves henceforth, 
except when the contrary is explicitly stated, to tests for which ¢ is the decision 
function. This will be assumed even if not explicitly stated. 

The function ¢ has not yet been uniquely defined when \ = 1. A definition 
convenient for later purposes will be given in the next section. F is the same 
for all definitions. 

We thus have that ¢ is a function only of \, or, what comes to the same thing 

Pin 


when W is fixed, of r, = * Define 
On 





= 


m 


oo = 


SEQUENTIAL PROBABILITY RATIO TEST 329 


We shall now prove 

Lemma 1. Let g, W, and c be fixed. There exists a sequential test S* for which 
the average risk is a minimum. Its sample size function n(w, S*) can be defined 
by means of a properly chosen subset K of the non-negative half-line as follows: 
For any » consider the associated sequence 


ies 485°" 


and let j be the smallest integer for which r;eK. Then n = j. The function n 
may be undcfined on a set of points w whose probability according to Hy and H, is zero. 


Let a = (@,,°*:* , Ga) be any point in some finite d-dimensional Euclidean 
. ‘ a 

space, provided only that poa(a) and p,a(a) are not both zero. Let b = pa and 
Pod\ a 

let (a) = cd + min (Loa, Lia). Let D be any sequential test whatever for 


which n(w, D) > d for any w whose first d coérdinates are the same as those of 
a, and for which E(n | a, D) < «, where E(n | a, D) is the conditional expected 
value of n according to the test D under the condition that the first d codérdinates 
of w are the same as those of a. For brevity let G represent the set of points w 
which fulfill this last condition, i.e., that the first d codrdinates of w are the same 
as those of a. Finally, let E(L, | a, D) be the conditional expected value of 
L, according to D under the condition that w is in the set G. We know that 
min(Log , Lia) depends only on ra(a) = b. 
Write 


v(a) = sup (l(a) — E(L, | a, D)}. 


Let do = (1, °** 5 Qox) be any point such that 


Pra(@) ae Pix (do) 


a) Pox (Qo) 


Let Do be any sequential test whatever for which n(w, Do) > k for any w whose 
first i coordinates, are the same as those of a , and for which E(n | ay , Do)<@ 
Let 


v(ao) = sup [l(ao) — E(L, | ao , Do)). 


We shall prove that v(a) = v(a). Thus we shall be justified in writing 
y(b) = v(a) = v(a). 


Suppose, therefore that v(a) > v(ao). Let Dy be a test of the type D such 
that 


l(a) — E(L, | a, D:) > a 


We now partially define another sequential test Dy) of the type Do as follows: Let 


Gd = M&,°°* 54a, Yr, °°? Ye; 











330 A. WALD AND J. WOLFOWITZ 


be any sequence such that n(@, D:) = d+ ¢t. Then for the sequence 
@y = G1, °** > Qk, Yi, °°? Yt, 


let n(@, Dio) = k + t. The decision function yy associated with Dy will be 
partially defined as follows: 


Yo(do) = ¢(a) : 


(The reader will observe that it may happen that Wo(do) # ¢(d)). Since ra(a) = 
rz(ao) it follows that 


la) — E(L, | a, Di) = l(a) — E(Ly | a , Dio) > v(a) + v(do) > v(a), 
in violation of the definition of v(ao). A similar contradiction is obtained if 
v(a) < v(ao). Hence v(a) = v(ao) as was stated above. 

We define K to consist of all numbers b which are such that there exist points 
a with ra(a) = b, and for which y(b) < 0. We shall now prove that the test S* 
defined in the statement of the lemma is such that R(S*) isa minimum. Recall 
that the average risk is the expected value of L,. Let S be any other test. 
Let a* = (ay g HOR az) be any sequence such that either n(a*, S*) = d*, or 
n(a*, S) = d*, but n(a*, S*) # n(a*, S). We exclude the trivial case that the 
probability of the occurrence of such a sequence, under both Hp and H, , is zero, 
Let ra-(a*) = b*. The sequence a* may be one of three types: 

1) y(b*) < 0. Hence b* ¢ K, n(a*, S) > d*. It is more advantageous, from 
the point of view of diminishing the average risk, to terminate the sequential 
process at once, since E(L, | a*, S) > l(a*). 

2) y(b*) = 0. Hence b* e K, n(a*, S) > d*. If l(a*) — E(L, | a*, S) = 0, 
i.e., the supremum is actually attained by S, then, as far as the average risk is 
concerned, it makes no difference whether the sequential process is terminated 
with a* or continued according to S. If, however, l(a*) — E(L, | a*, S) < 0, 
it is clearly disadvantageous to proceed according to S. It is impossible that 
l(a*) — E(L, | a*, S) > 0, since y(b*) = 0. 

3) y(b*) > 0. Hence b* ¢ K, n(a*, S) = d*. Clearly it is more advantageous 
from the point of view of diminishing the average risk not to terminate the 
sequential process, but to continue with at least one more observation. After 
one more observation we are either in case 1 or 2, where it is advantageous to 
terminate the sequential process, or again in case 3, where it is advantageous to 
take yet another observation. 

We conclude that R(S*) is a minimum, as was to be proved. 








4. A fundamental lemma. Consider the complement of K with respect to 
the non-negative half-line, and from it delete all points b’ for which there exists 
no point a in some d-dimensional Euclidean space such that ra(a) = b’. The 
point 1 is never to be considered as of the type of b’, i.e., 1 is never to be deleted. 
Designate the resulting set by K. 








= VW 


cer ~w 


-— — aS \w we 


SEQUENTIAL PROBABILITY RATIO TEST 331 


Our proof of the theorem to which this paper is devoted hinges on the follow- 
ing lemma: P 
Lemma 2. Let W, g, c be fixed, and K be as defined above. There exist two posi- 


‘ W, 
tive numbers A and B, with B < Vo go 
VW 191 


a) if be K, thenettherb > Aorb<B 

b) if be K, BS OKA. 

Two remarks may be made before proceeding with the proof: 

1) We may now complete the definition of ¢ for tests of the type of S*. The 


< A, such that 





reader will recall that ¢ was not uniquely defined when \ = 1,i.e., whenr, = = Jo 
191 
a Vi, ‘ 
Lemma 2 shows that it is necessary to define ¢g(A) only when \ = ah e K and 
191 

. ae i : — . Wo go rm ; ° ‘ Wo Jo 
\is therefore either A or B. We will defineg {| ——~ } as0 or 1, according as ——* 
Win Win 
is A or B, and A # B. This is simply a convenient definition which will give 

, 7 W . ; ia sss 
uniqueness. When A = B = Wa, e A, the situation is completely trivial, and 

191 


we may take y = 0 arbitrarily. 
2) If 1 « K the above lemma shows that the average risk is minimized (for 
fixed W, g, c, of course) by taking no observations at all. We have ¢ = 0 or 1 
according as 1 > A orl < B. 
W iia i 
PRooF OF THE LEMMA: Let h > Ww be apointinK. Wewill prove that any 
191 
: W . : ‘ 
point h’ such tha Wao, < h’ < h, and such that there exists a point a’ in some 
191 . 
d'-dimensional Euclidean space for which rq-(a’) = h’, isalsoin K. Ina similar 
Wo Jo 


. . . > . / 
iv, 8 any point in K, any point ho such that 
1fi 


way it can be shown that, if ho < 


W 0 Jo 
W, fi 
Euclidean space for which rg(a9) = ho, is also in K. This will prove the 
lemma. 

Let therefore h and h’ be as above. Let S* be the sequential test based on 
K, with the decision function ¢. Let a be a point in d-space such that ra(a) = h. 
Since h e K we have y(h) > 0. 

We now wish to define partially another sequential test S, with a decision 
function which may be different from ¢, as follows: Let a’ be defined as above. 
Write 


K<kh < , and such that there exists a point ay in some d’’-dimensional 


a = (ay ; eee ; Qa) 
, =_ (ai, a6 ; aq"). 
Let 


oe = Gy °°" 5 Sls Mis °°? a Be 














332 A. WALD AND J. WOLFOWITZ 


be any sequence such that n(@, S*) = d+ t. Then for the sequence 


-/ 


/ / 
a = is oO" 5 at og Bg Oo 5a 


let n(@’, 8) =a’ +t. The decision function y associated with S will be partially 
defined as follows: 


y(a’) = ¢(a). 


Clearly 
(4.1) E;(n | a, S*) — d = E,j(n\a’, 8S) —d’ (¢ = 0,1) 
and 
(4.2) Ee | a, S*) = Ey | a’, 8) (2 = 0, 1). 


Furthermore, we have 
l(a) — E(L, | a, S*) 
(4.3) = —" _ (WW, + ed — cE(n| a, S*) — Will — Eo(o la, 8*)]} 
Yo + qi h 


gih 


— {ed — cE\(n | a, S*) — Wi Ex(e | a, S*)}. 
do ol gi h ’ 1 | ? ) 1 1(¢ i “ys )} 


ce 


Since y(h) > 0, and since 

(4.4) cd — cE\(n | a, S*) — WiE(g | a, S*) <0, 
we must have 

(4.5) Wo + ed — cEy(n 
From h’ < h it follows that 


a, S*) —_ Wl = Eye a, S*)] > 0. 


| 


Jo Jo qh’ gih 
- a 2 a and a er 
got gh” gtmh got nh ~ 9% + mh 
Relations (4.1), (4.2), (4.4), (4.5) and (4.6) imply that the value of the right hand 
member of (4.3) is increased by replacing ¢, h, a, S* and d by y, h’, a’, S, and 
d', respectively. This proves our lemma. 

If there are values which r; cannot assume the pair B, A might not be unique. 
For convenience we shall define A and B uniquely in the manner described below. 
We will always adhere to this definition thereafter. | 

We shall first define y(h) for all positive h in a manner consistent with the 
previous definition, which defined y(h) only for those values of h which could be 
assumed by r;. Let h be any positive number and D(h) be any sequential test 
with the following properties: 


(4.6) 


there exists a set Q(h) of positive numbers such that n = j 
if and only if the j-th member of the sequence 


(4.7) 


hr; ; hre ; hr; , oc 





nd 
nd 


le. 


he 


be 


st 


ee 


SEQUENTIAL PROBABILITY RATIO TEST 333 


is the first element of the sequence to be in Q(h) 


(4.8) E,(n| D(h)) < © (¢ = 0, 1). 
We define, for h > Wogo 
Wi gi’ 
49) yh | D(H) = (Wo Bale | D(h)) — ea(n | D (h))} 
gih , 
—_ {-—Wi E\(¢ | D(h)) — cEi(n | D(h))}, 
go + gih 
(4.10) y(h) = sup y(h | D(h)) 
with a corresponding definition for h < re . Thus y(h) is defined for all posi- 
1Y1 


tiveh. This definition coincides with the previous definition whenever the latter 
is applicable. It is true that the supremum operation in (4.10) is limited to 
tests which depend only on the probability ratio, as (4.7) implies, but the argu- 
ment of Lemma 1 shows that this limitation does not diminish the supremum. 
é W ogo os , : 
(It might appear that, for h = ae , y(h) is not uniquely defined. We shall 
191 

shortly see that this is not the case.) 

The quantity y(h) depends, of course, on g) and g,. To put this in evidence, 
we shall also write y(h, go, g:). One can easily verify that 


go gilt h ). 
= 71, — 
Y ( 1, Jo, 91) of "Jo > gi n” go + 4 fi nh 


More generally, for any positive values h and h’, we have y(h, go , gi) = 
y(h', Jo, Gi), Where go and g; are suitable functions of go , gi ,h, and h’. Thus, if h 
is not an admissible + ralue of the probability ratio and h’ is any admissible value, 
we can interpret the value of y(h, go , g:) as the value of y corresponding to h’ and 
some properly chosen a priori probabilities g and gj; . 


: Wog 
We now define A as the greatest lower bound of all pointsh > i 29 tor which 
191 
‘ , ; Woe 
y(h) <0. Wedefine Bas the least upper bound of all points h < .°% for which 
1h 
ion aaa Wog 
y(h) <0. Ify(h) < 0 for all A the above definition implies A = B = Woe . 
191 


The argument of Lemma 2 shows that y(h) is monotonically increasing in the 


: W ‘ ; wa : 
interval (x, ir - , and that y(h) is monotonically decreasing in the inter- 
191) 


Wo do 
ral me. eB 
va (Freee, t) 


We shall now define a sequential test S*(h) for every positive h. The decision 














334 A. WALD AND J. WOLFOWITZ 
function of S*(h) will be ¢, and n = 7 if and only if the j-th member 
of the sequence 

y(hri), yv(hre), y(hrs), + 


is the first element to be < 0. We see that 


(4.11) y(h) = y(h | S*(h)) 
for all h. Incidentally, this proves that y(h) was uniquely defined at 
— Woge 
Wig 


Lemma 3. The function y(h) has the following properties: 
a) It is continuous for all h. 
b) y(A) = y(B) = 0 
c) y(h) < Oforh > Aor < B. 


Only a) and c) require proof, since b) is a trivial consequence of a) and the 
definition of A and B. 
Wogo 
Wig 
Within a neighborhood of h both E,(n | S*(z)) and E,(n | S*(z)) are bounded. 
Let A be an arbitrarily given, positive number. Let h’ and h” be any two points 
in a sufficiently small neighborhood of h, to be described shortly. We proceed 
as in the argument of Lemma 2, with the present h’ corresponding toh of Lemma 
2, the present h’’ corresponding to h’ of Lemma 2, and with S*(h’) corresponding 


Let h be any point except , and let z be any point in a neighborhood of h. 





0 12 ‘ 
to S* of Lemma 2. Since —*°— and —%* — are continuous functions of Z, 
Jo + 91% Jo + yz 
and since E,(n | S*(z)) and E,(n | S*(z)) are bounded functions of z, we con- 
clude that, when the neighborhood of h is sufficiently small, 


yh’) = y(n’) — A. 

Reversing the roles of h’ and h”’ we obtain that in this neighborhood 
y(h') = v(h") — A, 

and conclude that 
y(n’) — yh") | < A. 


Since A was arbitrary, this implies the continuity of y(h) everywhere, except 


perhaps ath = Woge 
Wig 
l i . W, = 
SORE Cen a gee 8 Wee , proceed as follows: Using the above argu- 
“19 


ment and the definition (4.9), (4.10), we prove that y(h) is continuous on the right 








VS a we Ge 





SEQUENTIAL PROBABILITY RATIO TEST 335 











at h = Wogo . Using, at the point h = Wogo , the definition of y(h | D(h)) for 

Wi gi Wig 

Wog - 
¢ aoe is. 

hs Wig ia 

v(h| Dh) = —P— {—WoBa(l — | D(h)) — eon | D(h))} 
(4.12) : 

gil 7 Ze an 
se ao {Wi E,(1 — ¢|D(h)) — cEi(n| D(h))}, 
(4.10) and (4.11), we prove that y(h) is continuous on the left at h = we ‘ 
1Y1 


This proves a). 
To prove c), we proceed as follows: Suppose for ho > A we had y(ho) = 0. 
Since 


{ —Wiki(¢ | S*(ho)) ree cE, (n | S*(ho))} < 0, 
we would have that 
{WoEo(¢ | S*(ho)) _- cEy(n | S*(ho))} > ©. 


Wego < 
W 191 
h<ho. This, however, is impossible, because it is a violation of the definition 
of A. 

In a similar way we prove that if h < B, y(h) <0. This proves c) and with 


it the lemma. 


An argument like that of Lemma 2 would then show that y(h) > 0 for 


5. The behavior of A and B. Lemma 4. Let gandc be fixed. Then A and 
B are continuous functions of Wo and W,. 

ProorF: It will be sufficient to prove that A is continuous, the proof for B 
being similar. Suppose A > B. Let h; and hz be such that 


a) B im hy < A <— ho ' 
b) he — hi < A for an arbitrary positive A. 


We write y(h) temporarily as y(h, Wo, Wi) in order to exhibit the dependence 
on Wo and W,. Then 


y(hi, Wo, Wi) > 0; 

y(ho, Wo, Wi) < 0. 

It follows from (4.9) that y(h | D(h)) is continuous in Wy, Wi, uniformly in 

D(h). Hence y(h, Wo, Wi) = sup y(h | D(h)) is also continuous in Wo, Wi. 
Hence, for AWo and AW, sufficiently small, 

y(hi, Wo + AWo, Wi + AWi) > O; 

y(ho, Wo + AWo, Wi + AWi) < 0. 








336 A. WALD AND J. WOLFOWITZ 


Therefore 
hy < A(Wo + AWo, Wi + AWi1) < he, 

which proves continuity, since A was arbitrary. 
If Wogo = A = B, we take hi < Wogo 


W, g1 WwW, gi 
argument show that 


y(n, Wo + AWo, Wi + AWi) < 0; 
¥ (he ’ Wo + AWo ’ Wi + AW;) < 0. 


< he, he — hy < A, and by a similar 


Thus 


h, < B(Wo + A Wo, Wi + AWi1) < A(Wo + AWo, Wi + AWi) < he. 
This proves the lemma. 
Lemma 5. Let g, c, and W, be fixed. A is strictly monotonic in Wy. As W, 
approaches 0, A approaches 0; as Wo approaches +“, A also approaches +, 
si W . ‘ 
Proor: Since A > lh A>+t+toasW,—>+2. If Wo <c no reduc- 
1Y1 
tion in average risk could compensate for taking even a single observation, no 
matter what the value of h. Hence y(h) < 0 for all h when Wy < c, so that 
A = B. Since B < , B—-0Oas Wo > 0. Hence A — 0 as Wy > 0. 
191 
It is evident from (4.9) that y(h | D(h)) is non-decreasing with increasing W, 
(everything else fixed). Hence also 


y(h) = sup y(h | D(h)), 
D(h) 


; : eae ' , , W . : 

is non-decreasing with increasing Wp) , for fixed h > — 099 and fixed VW 1. Fora 
191 

positive A sufficiently small and for any h such that A < h < A + A, we have 

that 





Ey (¢ | S*(h)) > 0. 
Hence, for such h, y(h, Wo , Wi) is strictly monotonically increasing with increas- 
ing W,. Therefore A is (strictly) monotonically increasing with increasing Wo. 
We now define the function W (WW, , 6) of the two positive arguments W,, 
6 so that 
A(Wo(Wi, 6), Wi) = 6. 


By Lemma 5 such a function exists and is single-valued. 


6. Properties of the function W (Wi, 6). Lemma 6. Wo(W1, 6) ts con- 
tinuous in W,. 
Proor: Let 
lim Ww = Wi, 


N-x 





SEQUENTIAL PROBABILITY RATIO TEST 337 


and suppose that the sequence {Wo(Wiw , 6)} did not converge. Suppose Wo 
and Wo were two distinct limit points of this sequence. From the continuity 
of A (Lemma 4) it would follow that 


A(Wo, W:) = A(Wo , Wi) 
This, however, violates Lemma 5. The only remaining possibility to be con- 
sidered is that 
lim W.(Win > é) = 0, 


N-© 
Wo9o 


If that were the case, then, since A > 
Wi gi 





, it would follow that A — o, 
in violation of the fact that A = 6. 
Lemma 7. We have, for fixed 6, 
lim W,(W:) = 0; 
0 


Wwi-w 


lim W,(Wi) = o. 


Wi-0 


Proor: If, for small Wi; , Wo(W1) were bounded below by a positive number, 
oWo(W,, 4) 
Wi gi 
sufficiently small, in violation of the fact that A = 6. To prove the second half 
of the lemma, assume that W,(W.i) is bounded above as Wi; — «. Then 


B (< we) will approach zeroas W;— «. Leth be fixed so that B < h < 6. 
191 





then, since A > g , we could make A arbitrarily large by taking W; 


Consider the totality of points w for which there exists an integer n*(w) such that: 
hr,» < B; 
B <hr; < 6, gaie. 


The conditional expected value of n* in this totality, when Hp is true, may be 
made arbitrarily large by making B sufficiently small. Hence, when W, is 
sufficiently large, for fixed but arbitrary h < 6, the optimum procedure from the 
point of minimizing the average risk is to reject Hp at once without taking any 
more observations. This, however, contradicts the fact that h < 6, and proves 
the lemma. 
LemMa 8. We have, for fixed 6 > 1, 
lim B(W.(Wi, 6), Wi) = 6; 


Ww 1-0 


lim B(W.(W1, 6), Wi) = 0. 


W\i-0 
Proor: By Lemma 7, 
lim W,(Wi) = 0. 


W130 














338 A. WALD AND J. WOLFOWITZ 


When, for fixed c, both Wo and W, are small enough, then, no matter what the 
value of h, y(h) < 0. Hence A = B, which proves the first half of the lemma, 

Let now {Wy} be a sequence such that lim Wiy = ©. Leté > 1. For the 
sake of brevity we write B(W1y) instead of 


B(Wo( Wd), Wiw). 


Suppose that, for sufficiently large N, B(Wiw) is bounded below by a positive 
number. Hence, for sufficiently large N, the probability of rejecting Hi when 
it it is true is bounded below by a positive number. Moreover, since 
B < Wee < 4 it follows that, for N sufficiently large, @°" is bounded abow 
Win Wings 
and below by positive constants. Thus, for large N the average risk of the test 
defined by B(W,w), 5, is greater than ugiWin , where wu is a positive constant 
which does not depend on N. Moreover, from the definition of B(W1wy), this 
risk is a minimum. 
we aa 1) < ; for all N sufficiently 
large. Let Vi, V2, withO < Vi < 1 < V2, be two constants such that, for the 
sequential probability ratio test determined by them, both a» and a are < «. 
Of course Eon and E,n are finite and determined by the test. For this test the 


average risk is less than 


€(Yo Wo + 1 Wiw) + cgo Eon + cg, Ein 


Let ¢ be a positive number such that { 


< 5m Ww + co Eon + co Ein 


3u 


"7 


91 Wi ; 


for Win large enough. This however contradicts the fact that the minimum 
risk is > wg:Win , and proves the lemma. 


7. Proof of the theorem. Let a given sequential probability ratio test So be 
defined by B*, A*; B* < 1 < A*. Let a;(So) be the probability, according to 
So , of rejecting H; when it is true. Let c be fixed. 

By Lemma 4, B is a continuous function of Wo and Wi. Let 6 = A* in 
Lemma 8. Then there exists a pair Wo , Wi, with Wy = W.(W; , A*), such that 


A (Wo ’ Wi) = #- 
B(Wo ’ W;) = Bp. 
Hence the average risk 


X Ji [W; ai(So) + cE;(n)], 


corresponding to the sequential test So is a mininum. 





itive 
vhen 
Since 


bove 


: test 
stant 
this 


ently 


SEQUENTIAL PROBABILITY RATIO TEST 339 


Now let Si be any other test for deciding between Hy and H, and such that 
ai(Sy) < a(S), and E;(n) exists (¢ = 1, 2). 
Then 
dX 9: (Wi a(S) + cEi(n)] < X gi (Ws ai(S:) + cEi(n)). 


a 


Since a;(S1) < ai(So), we have 


Le gi Ex(n) < he g; E;(n). 


Now go, g: Were arbitrarily chosen (subject, of course, to the obvious restric- 
tions). Hence it must be that 


EX(n) < E%X(n). 


This, however, is the desired result. 


REFERENCES 


1] A. WaLp, “‘Sequential tests of statistical hypotheses’, Annals of Math. Stat., Vol. 16 
(1945), pp. 117-186. 

(2] A. Wap, Sequential Analysis, John Wiley and Sons, Inc., New York, 1947. 

(3] CuarLes Srey, ‘‘A note on cumulative sums”, Annals of Math. Stat., Vol. 17 (1946), 
pp. 498-499. 











LIMITING DISTRIBUTION OF A ROOT OF A DETERMINANTAL 
EQUATION 


By D. N. Nanpa 
Institute of Statistics, University of North Carolina 


1. Summary. The exact distribution of a root of a determinantal equation 
when the roots are arranged in a monotonic order was obtained by 8. N. Roy 
[3] in 1943. <A different method for deriving the distribution of any one of these 
roots has been described by the author in [2]. In the present paper the limit- 
ing forms of these distributions are obtained. This paper gives a method by 
which the limiting distributions can be obtained without undergoing an inordi- 
nate amount of mathematical labor. 


2. Introduction. If « = || 2;; || and «* = || x;; || are two p-variate sample 
matrices with n; and n2 degrees of freedom and S = || xz’ ||/n; and S* = ||2*z*’ || 
/n2 are the covariance matrices which under the null hypothesis are independent 
estimates of the same population covariance matrix, then the joint distribution 
of the roots of the determinantal equation | A — 6(A + B)| = 0, where A = 
mS and B = n2S*, was obtained by Hsu [1] in 1939 and is 


rit [[r(Ltateti= 2) 
(1) Rar) = sant _2 y 


SS 
I (9;)¢7* I (1 — 6)?" I (9; — 0), 


OS <Gi15 +++ SH), 








where 1 = min (p, 11), 4 = |p — m|+landvy = m—p+1. The distribu- 
tion density may be expressed as 


l 
(2) R(l, m,n) = c(l, m, n) II tera — 6)" IT ©; — 4), 
w=1 t<fj 


where m = p/2 — land n = v/2 — 1. 


3. Method. Let 6; = ¢;/n in (2). The joint distribution reduces to 


; ol... 8) ae cm - ' 
(3) elm Ft=D 2 II 7a -— g/m") I Gs - 5) dt ess di, 
$<] 


O< & < fries Sh <n). 


340 





ion 
Loy 
ese 
Lit- 


‘di- 


ple 


Hl 
ent 
ion 


bu- 


n). 


LIMITING DISTRIBUTION 341 
As n tends to infinity the limit of (3) is 
l 
(4) K(1, m) IT of IT ( — ge ~"** dg; . 
i= t<j 


O<0%i<fii-:: Sh < &). 
The value of K(l, m) is 

















_ _¢e(l,m, n) 
lim. —7Fim+ti-1) /2 
l ° 
gel tmeesits 
= lim 7 — i ie - ee ™ 
“i + (tit) r (* t+") P(é/2)- nitimtte-n re 
t=1 
- jie (t tes ied 
. ee Ms” 2 om 2 y+ 1 : 
II Tr (Mat i+ ) T'(i/2) aren II r (m+! 7 ) spitimt tan 2 
i=1 t=1 


By using Stirling’s approximation for gamma functions and after simplifica- 
tion we get 


2 





Il r c + 2m + 2n 7 








im oe = 1. 
noe Nel) omnes 
I] a: (: 5 ) n 
Hence 
wil? 
K(l, m) = 7 y -— , 
. 2m + 1 . 
I] r Gases + ) T'(i/2) 


and therefore 
K(2,m) = 2°"*"/T(2m + 2), 
K(3, m) = 2°"*°/[['(m + 1)T(2m + 8)], 
K(4, m) = 2'"°/[T(2m + 2)T(2m + 4)], 
K(5, m) = 2°"°°/[30 (m + 1)T (2m + 3)T(2m + 5)). 


(5) 


Let 


(6) Ginl(x) = K(, m) | 


l 


[st I Ge — se" ats, 


Sorsfieyes: <$isz t= 


It can easily be observed that 


Gim(x) = Pr Gi < x) = lim Pr (n& < x) = lim Pr @ < *) 


n—2X nro 














342 D. N. NANDA 


Thus the limiting form of the distribution of the largest root can be obtained by 
integrating the density given in (4) according to the method described by the 
author in [2]. It is, however, observed that the mathematical labor is reduced 
considerably by adopting the following method. 

eferring to the results of the exact distribution of the lar~~ . root given in 
[2], let Fi,mn(x) = (0,1,1 — 1, --- , 1,2; m,n); thus F> -) = (0, 2, 1, x; m, n) 
and F3,m.n(x) = (0,3, 2,1,2;m,n). Thenc(l.” .)£'t,m,n(x) is the probability 
that none of the roots 0; exceeds x, and is +’ .j the cumulative distribution fune- 
tion of the greatest root. We shal: ..ow that lim c(l, m, n)Fim,»(a/n) = 


Gim(x). The reader is, however, asked to refer to [2] for the detailed explana- 
tion of the notations and certain mathematical operations used in this paper. 


4. Limiting distribution of the largest root. We will derive the distribution 
of the largest root for 1 = 2 and 3 by the two methods. A straightforward 
method will be named A. A second method, which proves to be very simple 
and easy will be called Method B. 

(a) = 2 
(i) MerHop A. We have, 


Pr (n6; < x) = Gom(x) = K(2, m) (h1b2)"(ba — S2)e FT dey dhe 
0<f$e<hi<ez 


By using the method described in [2], we have 


Gom(x) = K(2, m) { | o cre feet dy, its, 
0<Se<hSi<ez 0<$1<$2<ez 
= K(2,m){To"(y, 1,2; m + 1) — To'*(0, 1, y; m + 1)}, 
where 
. b 
To 9(y) = | g(y)- ye” dy, 
and 


b 
(7)(a, 1,b;m + 1) = / ote de = (ae — be) + (m + 1)(a, 1, b; m). 
Hence, 
Go,m(t) = K(2, m)TO*[y"Ve” — 2” e* + (m + 1)y, 1,2; m) + ye" 
— (m + 1)(0, 1, y; m))], 
K(2, m)To*[2y""e" — ae), 


as To “ly, 1, x; m) eal (0, l, y; m)| = 0. 





LIMITING DISTRIBUTION 343 


Therefore 


lim Pr (n6; < x) = Gem(x) = K(2,m) 


nO 


8) .. : 
2 ‘2f y mile dy a ght [ ye” an} . 
0 0 


When z = ©, Gom(x) = 1; hence K(2,m) = 2°"**/T'(2m + 2), the valuegiven 
in (5). 

Now we shall derive the result by Method B. 

(ii) Mernop B. 


1 


Fama(t) = (0, 2,1, 25 m,n) = —7— —— 


‘ {2 | yd a eo dy yan ay i zt | y(1 bal y)” iy}, 
0 0 


a result given in [2]. 
Replacing x by x/n, we get 


1 


- / a SES —......... 
(0, 2, 1, x/n; m, n) as 


( ain zi/n ) 
2h yr — yy) dy — (a/n)"™" - /nyr [ y"(1 — y)" dy); 
0 0 ) 


also, letting y = u/n, we have 


1 
»/ = = ae aa 

” (0, 2, 1, x/n; m, n) (mi+n+2n™* 

( z , , 4 

. 2 | emda — u/n)* du — a" — x/ny | u"(1 — u/n)” au} ‘ 

\ 0 
Thus 
lim Pr (nO; < 2) = Pr (6, < a/n) = lim c(2, mM, n)(0, 2, 1, x/n; m, n), 


= a oo lo t rt du pa gn tie= [ u"e* au} 
T'(2m + 2) \ 0 0 ? 
which is the same as (8), obtained by Method A. 
(b) 2 = 3. 
(i) MerHop A. We have 


Pr(nO; < x) = Gsm(x) = K(3, m) merTm(s; — ¢e-7*M de; 


0<f3<$e<hi<z 


= K(3, m) [ ($1 2 f5)™e FITTED 11, 2, 3} doa dhe dts, 
0 


<o3<$e<hi<z 


where {1, 2, 3} = fif2{1, 2} +> 361 {38, 1} + fof3{2, 3}, as given in [2]. 











344 D. N. NANDA 
Or 
G3,m(z) = K(3, m) ‘{ 


+f 
<S3<he<hi<z 0<$1<S$3<fe<z 


| Se P(g, te) Pe aio 1, 2} df1 df ash, 
0<Se<hi<ss<z 
= K(3, m){To'*(y, 2, 1,2; m + 1) 

+ To" (0, 2, y, 1,2; m + 1) + To"*(0, 2,1, y; m + 1)}, 
where 


(a, 2,1,b; m) = / ($1 f2)(S1-a)e 8) dey dts. 


a<fe<$1<b 
We have already obtained 
(0, 2,1, 2; m) = Gom(x)/K(2, m) = {2 ye dy — emties | ye ays 
0 0 ) 


as given in (8). 
We also need the following results which are obtained by the method de- 
scribed for 1 = 2. 


b b 
(10) (a, 2,1,6; m) = 2/ ae du — (a™ te + be) / ue ™ in}, 
and 


(11) (a, 2,b, 1,¢; m) = omen | uve" du — antes f u"e “du 
a b 
b 
— Ne" / ue” inh. 


Gs,m(x) = K(38, m)To” 2/ one du — (ye + ae) [arte du 
y y 


Using these results we have 


xz y y 
2 — +1 — 2 +1 — 2m+3 —2 
— ye wf une du + 2"* | ue" dut 2 | wre du 
0 0 0 
+2 , +1 
m aes ™ — 
=— ova edu». 
0 ) 
Simplifying we get 


(12) iim Pr(né; < x) = G3,m(x) on K(3, m) 2 [ ee au [ ue “du 
0 0 


n—o 


z z 
.- Qm+2 —2 +2 — 
-2/ ute du | ue “ du— a” 'e~ 
0 0 


z z 
2m+1 —2 +1 - = 
E [ ue "du — a” e nf u"e ™ au}, 
/0 0 ) 


where K(3, m) = 2?"*°/[T(2m + 1)I'(2m + 3)]. 


| 





LIMITING DISTRIBUTION 345 


(ii) MeTHoD B. 


Famalt) = (0, 3, 2, 1, x3 m, n) 


1 
en 1 ° : as 
oes [2(0, 1, 2; 2m + 3, 2n + 1)(0, 1, 2; m, n) 


—2(0, 1,2; m + 1, n)(0, 1, 2; 2m + 2, 2n + 1) 
— (0,2; m + 2,n + 1)(0, 2, 1, 2; m, n)], 
g result given in [2]. 
Replacing x by x/n and putting u/n for the variate y of integration, we have, 


1 
F'3.min(@) — (0, 3, 2, 1; 2/n; m, n) — m+n+3 


2. P 2m+3 2n+1 2 ‘i ” 2 
aan | un(1L — u/n) au [ u"(1 — u/n)”" du = sr 





z s m+2 n+l 
m+1 J n : 2m+2 2n+1 x (1 a) a/n) 
I Uu (1 = n) du I Uu (1 = u/n) du— nm +n + 2) 
|2/ erd — uf/n)y du — 2" — xfny | u"(1 — u/n)” au]. 
0 0 
Hence 


lim Pr (né,; < x) = lim Pr (s < =) = lim c(3, m, n)F3m,n(2) 


nO ro nro 


z z z z 
- 2m+3 —2 — 2m+2 —2 +1 + 
= K(3, m) (2 ” dis du | une “du — 2 | eS ue” du 
0 0 0 0 


z z 
+2 — 2m+1 —2 l= _ 
— 2x" e”* {2 | ute du — a | ue au}} , 
0 0 


K(3, m) = 2"**/[T(m + 1)T (2m + 3)]. 
This result is the same as (12) obtained by Method A. 
We have thus shown that Method B is applicable for obtaming the limiting 
forms of the distribution of the largest root and that it is much simpler as com- 
pared to the straightforward method called Method A here. 


The limiting distributions for the largest root for 1 = 4 and 5 are listed below. 
(c) f = 4. 


lim Pr (né; < x) = lim Pr (0 < 4 = G4»,(2) 


n> n—-o 


where 


{ 
s. | » m+5 —2u G m\9 7 2m+4 —2u 
= K(4, m) 2] une du ne o 2 | wre ™ du 


x zx 
. ‘ ‘ 6 . Gro (x) 
Im +9 > 16 = zi 2, a 
2 | ue“ du — x”"e “| une“ du + (m + 2) 
0 0 K (2, m) 


al Genri(2) arta Gale 
2 | 2m 3, 2u a 2,m+1\hs _ x" 3o—r ae 
+ -"s du K@, m +1) re = (3, m)f? 














346 


where 
K(4, m) = 2°"**/[r(2m + 2)T(2m + 4)). 
(d) i = 5. ( 
lim Pr (76, < x) = lim Pr (s < 2) = G5,m(x) 
= K(5, m) 12 [ en tt oo du Gs, oe -2 [ow 2m+6 ed 
0 k(3, m | 
2 | wnt oo du | u"e “du —2 [ wnts oe ly 
0 0 0 
. oo eS Ga, = Gs, m(2) 
[u e “du — 2x”'“e Km y+ (m + 3) K(, m) 


z z zx 
2 5 —2 2 5 -—2 — 
os 2 | er""é *du{2 [ wnt e ial ue“ du 
0 0 0 


zx z 
2m+3 2 +2 — +3 — 
— 2 | or" mal une “du — xe” 
0 0 


2] une du — a te | ue “du+ (m+ 2) Ge, m(x) 
0 0 K (2, m) 


G; m+i(x) [ 2m+4 -2u m+4 -—z2r Gs, n(x) \ 
o> @ eee. ide fe 
K(8, m + 1) Jo 7 ill zie Kk (4, m)§’ 








where 


K(5, m) = 2°"*°/[30r(m + 1) (2m + 3)T(2m + 5)). 


5. Limiting distribution of the smallest root. It was shown in [2] that the 
exact distribution of the smallest root can be obtained by using the relation 


Pr (@:; <x) = 1—Pr(a< 


1—2z|v, pn). 


This relation, however, does not help in obtaining the limiting distribution of 
the smallest root from that of the largest root. The limiting distribution of the 
smallest root can be obtained by the method illustrated below. 

(a) l = 


The exact distribution of the smallest root #2 can be expressed as 
Pr (@ < x) = c(2, m, n){(0, 2, 1, x; m, n) + (0, 2, x, 1, z; m, n)}, 
where z = 1. Replacing x by x/n, we get 
Pr (62 < x/n) = c(2, m, n){(0, 2, 1, x/n; m, n) + (0, 2, x/n, 1,2; m, n)}, 


where 


z/n 
(0, 2,1, 2/n; m,n) = oa }2 | yd — yy" dy 


xin 
— (0,2/n;m+1,n +1) | y"(1 — y)” in|, 
0 




















LIMITING DISTRIBUTION 347 


and 


1 


(0, 2, 2/n, 1, 25 m, n) = ~ 


| ( z/n;m+1,n+ 1) 


nz xin 
[yd = way - O23m+i,n+¥ [ ya — wr ayl, 
0 


“0 


as obtained from (6) of [2]. 
The limiting distribution of , is 


(13) Pr (@ < 2x/n) = lim c(2, m, n){(0, 2,1,2/n; m, n) + (0,2, x/n, 1,2; m, n)}. 
Putting u/n for y, the variate of integration and allowing n to tend to infinity, 
we have 


lim c(2, m, n)(0, 2, 1, x/n; m, n) 


nwo 


}] > 


( = = 
r 2m-+ —2 +1 — _ } 
= K(2, m){2 [ gerne du = a” ¢ rf u” e€ "au, 
\ 0 0 


wo 
‘ ~ +] _ 
lim c(2, m, n)(0, 2, x/n, 1, 2; m, n) = K(2, m)z”™™ e* I ue “du. 


= K(2, m)a™**e*T(m + 1). 
Substituting these results in (13) we have 
lim Pr (n62 < x) = lim Pr (2 < x/n) 


he no n 


( z z 
, 2m+1 —2 ‘= _ 
= K(2, m) 2] une du — ae ’ ue“ du 
0 0 


of m+1 


he +a e * Tim + Dy, 


where 
K(2, m) = 2?°"**/[T (2m + 2)]. 

(b) 2 = 3. 

The exact distribution of the smallest root can be expressed as 
Pr (0; < x) = c(3, m, n)[(O, 3, 2, 1, x; m, n) + (0, 3, 2, x, 1, 2; m, n) 

+ (0, 3, z, 2, 1, 2; m, n)], 

where z = 1. 

Replacing x by x/n and allowing n to tend to infinity we have 
Pr (né3 < x) = lim c(3, m, n)[(0, 3, 2, 1, x/n; m, n) 


|, | as 
+ (0, 3, 2, 2/n, 1, 2; m, n) + (0, 3, x/n, 2, 1, 2; m, n)). 











348 D. N. NANDA 


The values of these components on the right hand side of the above equation 
are given below. 


lim c(3, m, n)(0, 3, 2, 1, 2/n; m, n) = Gs,m(x), given by (12), 


n—-o 


lim c(3, m, n)(0, 3, 2, x/n, 1, 2; m, n) 
= K(8, m) { [ u"e “du E [ unto dy 
# 0 


z z 

2 — 1 2 2 —2 

(15) ~ g”™ 6 “" of dul — x” &* 2 | pre ae 
0 0 

z eo z 

= _ 2 = _ 
— f ¢ a ue * a 4 of “ unt du | u"e “du 
0 z 0 


0 z \ 
+1 — 2m+2 —2 
- 2 | ue “au [ ue ™ duh, 
x 0 ) 
and 


° ae ( . m —u e m+ —2yu 
lim c(3, m, n)(0, 3, x/n, 2,1, 2; m,n) = K(8, m) is ue“ du }2/ unto dy 
0 z 


_ 


n—-o 


2 eo 3° 
mt+2 — m+l —wu m+2 —z 2m+1 24 +1 — - 

— 2x" “e | ue“ dul— 2”e 2/ “une “du— x” e 7 ue“ du 

z z z 
zr eo z eo 
42 — i 1 ia 3 2m+2 —2 

+2" e s -  . au | ue “du — 2 | - s du [ “ e. du>. 

0 x 0 2 ) 


Substituting in (14) we have, 
lim Pr (n6; < 2) = {2°"*°/[P(m + 1)P(2m + 3)]} 


noo - 


00 
2m+3 —2 Qm+3 —2 
wi aa au [ une “dut af Pa au | u"e “ du 

z 

z oc 

_- 2 2-2 +1 2m+2 -2 
_ 2 fw ‘— au [ ee “du — 2 “a” . du | “me du 
0 “0 z 


wm zx 
+2 - 2m+1 —2  .- 2m+1 20 
— 2r”""e “f une" du — 2x" e “| a” 6a 
0 0 


z oo \ 
Qm+3 —2; “ ae 
+o” ""e~ (| ue “dut | <<” in) ; 
0 0 } 


/ 


to 


Or, 
lim Pr (né; < x) = — + 1)F(2m + 3)] 


nrn—0 


Q2m +4 


} {EQm _ 4 _ - 2n+3 —2 i oa 
) une“ dut2f we“ du] ue“ du 
\ 0 x 


z z 2 
Q2m+2 —21 + — 2m+2 —2 
— 2r(m + 2) r “ue "du — 2 | ae au [ “ue du 
0 z 


Jo 
_ T(2m + 2) 


92 Im+l1 


n+2 2 —2 |* om+1 -2 2m+3 —2 
gett oe rf une ™ du + T(m + 1ja"™ e 


0 


Qm+3 --2r ° m u | 
+2 e ue du re 


/ 





LIMITING DISTRIBUTION 349 


Thus we have seen that this method can be used for obtaining the limiting 
distribution of the smallest root for any value of l. 


6. Limiting distribution of any intermediate root. The above method can 
also be used for obtaining the limiting distribution of any intermediate root. 
We shall give the distribution of @ for! = 3. We have 


(16) Pr (@ <x) = c(3,m, n){(0, 3, 2, 1,2; m,n) + (0,3, 2,2, 1,2; m, n)j, 


where z = 1. 

The lim,..~ c(3, m, n)(0, 3, 2, 1, x/n; m, n) and lim,_..< c(3, m, n)(O, 3, 2, x/n, 
1, z; m, n) are given by (12) and (15) respectively. Substituting these results 
in (16) and simplifying we get 


+3 ( « 7 
e Dae ; ee x = . ai a erat eae i ae 2 | m ie 
lim Pr (n2 < 2) Fm + DFQ@m +3) \7 4 wc" du 


zr wom s 
2m+3 —2 m+1 —1 2nm4+2 —2 
fu mre “du — 2 | ue ‘du [ a 
0 0 0 


9 


Zz z 
= 2m+1 —2 2 3 —2 _i 
— 4x7" | ee ™ du + 22°" e* | ue“ du 
0 0 


oe Zz © z 
+2 1, — —_— _ m+1 —1 
+2" ""e* | ue“ du | u"e “du — | ue“ du [ une “ dul, 
x 0 z “0 


Or, 
li P ( 6 me *) a eS or 4 1) [ 2m+3 —2u }: 
— r\n@o NL) = T(m fe Dr(2m a 3) \ m ; u c au 


z z 
. 2m+2 —2 2 — 2m+1 —2 2m+3 —2 
— 2r(m + 2) | ene du — 44°" 6 | ue du + 2x 
0 0 


Zz - z 
cg n+2 —z ba ae om 
[ une“ duta2”"™e”* | ue“ du I ue“ du 
z 0 


0 
oo z 
m —u m+1l1—u 
-| ue du | uo e. dul?. 
z 0 


Thus the limiting distribution of any intermediate root can be obtained by the 
above method. 


7. Further problems. ‘The limiting distribution of the largest root is found 
to be very helpful in obtaining the distribution of the sum of roots when m = 0. 
This condition implies that when the results are applied to canonical correlations 
the numbers of variates in the two sets differ by unity. The distributions for 
the sum of roots have been derived under the above condition for 1 = 2, 3 and 4 
and the results are being presented in the next paper of this series. 


8. Acknowledgements. I am extremely thankful to Drs. P. L. Hsu and 
Harold Hotelling for guidance and help in this research. 











350 D. N. NANDA 


REFERENCES 
[1] P. L. Hsu, ‘‘On the distribution of roots of certain determinantal equations,”’ Annals of 


Eugenics, Vol. 3 (1939), pp. 250-258. 

[2] D. N. Nanna, “Distribution of a root of a determinantal equation,’’ Annals of Math, 
Stat., Vol. 18 (1947). 

[3] S. N. Roy, ‘‘The individual sampling distribution of the maximum, the minimum and 
any intermediate of the ‘p’ statistics on the null hypothesis,’’ Sankhyd, Vol. 8 


(1943). 


ay 





f 


ON A SOURCE OF DOWNWARD BIAS IN THE ANALYSIS OF VARIANCE 
AND COVARIANCE 


By WituiaAm G. Mapow 


Institute of Statistics, University of North Carolina 


1. Summary. It is shown that if, in the analysis of variance, the experiments 
are not in a state of statistical control due to variations in the true means, then 
the test will have a downward bias. The power function of the analysis of var- 
jance test is obtained when this downward bias is present. 


2. Introduction. ‘I’o introduce the discussion of this bias let us consider the 
generalized Student’s hypothesis. 

Let y1,°°* , Yew be normally and independently distributed with variance 
o’, and let the expected value of y:,, be a. Then the generalized Student’s 
hypothesis is 


(Null hypothesis) dy» =a 


and the class of alternative hypotheses against which the null hypothesis is 
tested is 


(Class A) din = aj. 


From the statement of the null hypothesis and the alternatives of Class A it 
follows that both the null hypothesis and the alternatives of Class A require that 


(1.1) Qn = °°* = Ag. 


Since our experiments are rarely in such perfect statistical control that (1.1) 
holds whether or not the null hypothesis is true, it becomes reasonable to in- 
vestigate the existing F test when instead of the alternatives to the null hypoth- 
esis being of Class A, they are simply Class B: 
(Class B) Equation (1.1) is false for at least one value of 7. 

Furthermore, for many practical purposes we would prefer to test the average 
null hypothesis: 


(Average null hypothesis) Gd; = G, 


where NG; = an +--+: + ajy and ké = & + +--+ + &, instead of the null 
hypothesis, the alternatives to the average null hypothesis being of Class C. 
(Class C) The a;, can have any values such that not all the @; equal 4. 


1 Throughout this paper the letter z will assume all integral values from 1 to k, the letters 
u,v will assume all integral values from 1 to N, the letters y, 7 will assume all integral values 
from 1 to m, the letter a will assume all integral values from mn, + --- + n 1 + 1 to 
Mm +--+ + n,, (no = 0), and a , a will assume all integral values from 0 to «. 


351 











352 WILLIAM G. MADOW 


The F-test of the null hypothesis against the alternatives of Class A is, as ig 
well known, 


KN -1) 2G - 9)” 
F=- wand, 
(k—-1 } m (yr — 9:)”) 








where NY; = ya + +--+ + yiwand kj = 71 + --- + %. To answer the ques- 
tions formulated above concerning the F-test when the average null hypothesis 
or the alternatives of classes B or C are true, we must then calculate the dis- 
tribution of F under these various conditions. This is done in Section 3. 
A somewhat informal means of obtaining the conclusions is that of studying 
F itself. Taking the expected values of the numerator and denominator of F 
and defining 
‘ N Zz (a4; — a)’ 
a tere 
(k — 1)o? 


2 - 
od: = k(N — 1)o' a a 2d (aiy = a;)° 


we obtain as the ratio of the two expected values 


It is well known that, in general, the larger the value of N the more closely will F 
approximate F. From this fact it is easy to see why if the null hypothesis is true, 
then F ~ 1, whereas if the null hypothesis is false but an alternative of Class A 
is true then 


Fr1+¢i>1 


so that large values of F become more likely than if the null hypothesis were true. 
However, if an alternative of Class B is true then 


—~it+¢i 
” 1+4 


so that if ¢; < $3, smaller values of F occur more frequently than indicated 
by the null hypothesis. Thus we would tend to accept the null hypothesis more 
frequently than desired when it is false. ven when the null hypothesis is false 
so that ¢; > 0, the values of F will tend to be less if ¢; > O than if $2 = @ 
whether or not ¢; < 43. Not only is the probability of an error of the first kind 
less than the value ¢ we may have previously selected, but also the power of the 
test is less than would be indicated by Tang’s tables [1]. The lack of statisti- 
cal control represented by variation of expected values within a class has the 
effect’ of making it less likely than the standard F-test indicates that the null 








ll 


DOWNWARD BIAS 353 


hypothesis will be rejected whether it be true or false. Furthermore, even for 
relatively low values of ¢2 , the reductions in the probabilities of rejection may 
be over 40 per cent as indicated by some examples given below. 

If the average null hypothesis is true but (1.1) is false it follows that 


slant 

1+ ¢:’ 

so that the full effect of the downward bias occurs in that case. Thus in cases 
where statistical control is lacking, to test the average null hypothesis by the F- 
test may well result in accepting the hypothesis when it is false. If the null 
hypothesis is rejected, however, then we can expect that the differences among 
the true means are even larger than indicated by Tang’s tables. 

To illustrate, it is shown in Section 4 that if k = 5 and N = 7, then the prob- 
ability of rejecting the average null hypothesis when it is true, but (1.1) is false 
will not be the preassigned .05 but something less than .03 if ¢; > .05. Fur- 
thermore, if ¢; > .07, then the power of the F tests for this example will be re- 
duced by at least 40 per cent whatever the value of ¢}. 

The conclusions reached above remain valid for the analysis of variance and 
covariance in general. In the general case however, the value of the average 
null hypothesis in simplifying the analysis may be considerably reduced since 
the parameter ¢; no longer vanishes when the average null hypothesis is true. 
For example, if Ey, = 8,2, , and if the average null hypothesis is 8 = 0, where 
NB = 6 + -:: + By, then upon calculating 


(LB, x3)? 
7 Le 


we see that ¢; will not vanish in general if 8 vanishes. 

Although as shown above the average null hypothesis may not have too great 
importance in the case of regression, yet if the ‘“‘variance between treatments”’ 
is a function of arithmetic means of the random variables as in the ‘‘pure” 
analysis of variance the average null hypothesis may well be very useful. Simple 
examples of this are provided by the randomized block, Latin square, and similar 
designs. 

The distributions that we shall need are given in Section 3. The inequalities 
on the basis of which the bias is demonstrated are obtained in Section 4. 

It would be highly desirable to have Tang’s tables extended so that they might 
provide the answers to the questions raised by this source of bias. In the ab- 
sence of such extensions the inequalities of Section 4 may give some rough 
idea, but these inequalities are not sharp enough. 


Fw 


3. The calculation of the distributions. The following theorem was proved, 
although not explicitly stated, as part of an earlier note [2]. (Note the change 
from x; to y; as the notation for the random variable.) 











354 WILLIAM G. MADOW 


THEOREM 1. Let y:,-°++ , yw be normally and independently distributed with 
variance o and means a, ++ , Gy and let q, «++ , dm be quadratic forms 


(y) 
dy = Lo Any Yur 
id 
in Yi,°**, Yn of ranks m,°+:,%m. Then, if an orthogonal transformation 
3 2 Crp ep 
B 


exists such that 


(2.1) qY = ze Ss ’ 
it follows that the random variables q,/o° are independently distributed in x” dis- 
tributions with degrees of freedom 1; , +-- , Nm and parameters \1,°+* , Xm, where 
1 Eq 
a, bee ) —_ §dy _ Ny 
My = 953 X eS )|hCUe 


Various conditions for the existence of an orthogonal transformation satisfy- 
ing (2.1) of Theorem 1 have been given. Among these are: 
1. Cochran’s [3] condition. If >» qy = p i y, then a necessary and sufficient 
v 


condition for the existence of an orthogonal transformation satisfying (2.1) 
is Dn, = N. 


Y 

2. Craig’s [4] condition. If A, denotes the matrix (a,$”) then a necessary 
and sufficient condition for the existence of an orthogonal transformation satis- 
fying (2.1) is A,A, = 6,,A, where 6,, is the null matrix if y ¥ 7 and the identity 
matrix if y = 7. 

3. Linear Hypothesis condition. (Kolodziejezyk [5]) If \ be the likelihood 
ratio test of a linear hypothesis and if EZ” = 1 — d”*, then BE” = qi/(q. + @) 
and an orthogonal transformation exists satisfying (2.1) with m = 2. 

To summarize some results obtained by Tang [1], let us state 

Turorem 2. If x;" and x2" are independently distributed in distributions with 
m% and ne degrees of freedom and parameters d; and dz , and if 


xi 
Pew 

xr + x” 
then the probability density of E” is 


p= p(E* | AL ? re » U1, Ne) 


e *12(F) (m/2)—1(1 _ E’) (n9/2)—1 





(2.2) AT? ae r e - Ne rs in + cx) 


PUSS ' ' n t 2 
: cu tan (% + ax) (2 + as) 


(EYL = BY. 








= 


DOWNWARD BIAS 355 


By assigning certain values to \; and dz we obtain the following special cases 
of (2.2) 


Pi = p(E | Ai ’ 0, Ni 5 ne) a eB yla—14 = per 
a uN Ne 
A 'T' | ——— +m 
(2.3) < ( ) _ 
1 wit (% +" as) r(#) 
p(E” | 0, 2, Ny No) om e*2( BE?) 2-14 a E*)(9!2)—1 


ao mM + Ne 
(2.4) Ae r(=t* + cx) 
X ; nN ™m 
a2! r(%) (2 + cx) 
r(™ + “ 
(2.5) po = p(E’|0,0, m, m2) = a ae (Pye - Fy. 


(3) (3) 


It is noted that (2.3) is Tang’s distribution (112) upon which the calculations 
of his tables were based. To see this we need only make the correspondence 


= 
| 


(1 — E°)* 


This paper Tang 


Ai r 
M1, Ne fi, fo 
Qy1 2 


We define ¢ to be the probability of an error of the first kind. Tang obtained 
the critical values E% of E’ by requiring that 


1 
[. py dE* 
Ee 


€ say .01 or .05. 


P; 


Then he calculated 
E% 7 
fia I pi(E2 |, 0, m1, m2) dE* 


using the values of Et obtained above. Hence 1 — Piz is the power of the test. 
If, however, \; = 0 but A» ¥ 0, then to find 


1 
Pin = f p2(E” | 0, Ae, M1, Ne) dE’ 
Ee 








356 WILLIAM G. MADOW 
we could make the transformation G? = 1 — E” and find 


1-E2 
Pin = | p(G" | 0, Ax, m1, ne) dG’. 
0 


It is easy to verify that 
p(G | 0, Az » M1, NM) = pi(E” | Az ’ 0, N2 y 11) 


if we put G in place of E” in the latter density. It follows that to calculate P,,, 
it would be sufficient to have full tables of Tang’s distribution since 


1—E2 
Pm = [ pi(E lve, O, ne, m1) dE”. 
0 


Tang’s tables are not however sufficiently extensive. Furthermore, tables of 
(2.2) are also necessary. As yet these tables do not exist. However, some useful 
conclusions can be drawn from the inequalities obtained in the following section. 

First, however, let us evaluate ni , m2 , \1 and 2 for the generalized Student's 
hypothesis discussed in the introduction. It is easy to see that ny = k — 1 


and m2 = k(N — 1). To evaluate \; and 2 we note from Theorem 1 that we 
only need substitute Ey;; for y;; in gq; and g2 where 


mn=N i (9: — 9) 
@ = Lo (Yr — H)- 
Upon making these substitutions we obtain 


i = = x (a; — a)’ 
Ae = 53D (Ou — a;)’. 
Thus the various hypotheses concerning the a;; influence the distribution of F 


or E” = 1/(1 + Fni/nz) by affecting the values of \; and do. 


4. Limits of the values of p. It follows readily from (2.2) that, 


(*: + ") 
ri =a ™ 
2 ‘ (EB?) "2-14 ro ee 


an r(3)r(8) 


where 





ia NET an . 
‘a Aire - 1 2 + (E') (1 7 E*)*? Cn 


a1,a9Q ay! 


r (4 + no imal cx) I ‘ ) r (“") 
or ae ee, 
or ny 9 Ni + Mm 

(4a) r(B+a)r(™E™) 











DOWNWARD BIAS 357 


Now ifa > 0,b > 0, and 7 isan integer > 1, we have 


a a a a\? 
(: . i) * 53) (1 ' sFa=5) . (1 m i) 


Hence, it follows that 


1<¢ “ (*: 4. _ (* + n+ i 
a a a 


nN Ne 
Substituting we see that 


—\y-Ag Ay E2+X9(1—E?2) —— 
“se Spspe" 


2 
(3.2) — {ue (nia + ") exp E= =] or B(™ = ne) 


and 


(3.3) pi gree « p < piexp Ee + (1 — E nmin + He) 2™|. 


Let 2neo; = 14, 7 = 1, 2. 





1 
THEOREM 3. Lete = / , Po dE” so that ¢ is the probability of an error of the first 
Ee 
kind. Then, for all values of $3 
1 
(3.4) «> / po dE 
Ei 
and if E? > ni/(n. + ne), it follows that 
1 
(3.5) € > eexp{—2ned3 + 263(1 — E2)(m: + m)} > I, p, dE” > ee "**2, 
Ey 
Furthermore, for all values of $3 
1 1 
(3.6) pidk* > |. pak’, 
Ee - 
and if E* > (ny + 2)/(m + ne), it follows that 


1 1 
[. pi dE, > exp{ —2n2o; + 263(1 — E%)(n, + ne) 263} i p: dE” 
(3.7) € € 


1 1 
> [. p dE’ > etntt f p: dE’. 
Ee Ee 
Finally, if y can assume the two values 0 and 2, it follows that if 


‘ ee 
(3.8) $2 > 2(E2(n, + me) — (m + 7) > 





then if y = 


1 
(3.9) [. p2 dE” < 6 
Ee 








358 WILLIAM G. MADOW 
and if y = 2 
(3.10) [ / pdE’ <6 [ : p: dE’. 
PROOF. To prove (3.4) and (3.6) it is only necessary to follow Daly’s [6] 


procedure. Since 
exp{—2nod2 + 263(1 — E)(mi + me) + vos} 
and 
exp {— n2¢3E"} 


are decreasing functions of E”, and 


exp {— 2ned2 + 2ga(1 — E*)(m + ne) + 92} <1 
if 
2 m+ ¥ 
* *a ee 
the inequalities (3.5) and (3.7) follow immediately from (3.2) and (3.3). Finally 
exp {— 2nd + 262(1 — E’)(m + 2) + 92} <6<1 


if (3.8) is true, so that (3.9) and (3.10) follow. 

From (3.8), (3.9) and (3.10) we can calculate either a lower limit for the bias, 
if we know ¢2, or the upper limit that ¢2 can have if we wish the bias to be not 
greater than some given amount. Thus these limits do not answer the important 
question of what is a value ¢2 such that if ¢2 < @ then the bias is less than (1 — 
5)e. They only provide a value ¢’ of ¢2 such that if d. > ¢’ then the bias is at 
least (1 — d)e. 


If, for example, 6 = .5 and nm, = 1 as in the case of Students’ ratio; we have 
ify = 0 
2 .693 
g2 


2(n. HE? — 1) 

and if e¢ = .05, then E” decreases steadily from .903 if n. = 2, to .063 if nz = 60 
and the corresponding lower limits of ¢; decrease from .43 to .12. Thus, if 
¢2 > .43 or .12 in these two cases, it follows that the probability of rejecting the 
average null hypothesis will be not .05 but something less than .025. 


If 6 = 6 and m = 4, n2 = 30 then we can evaluate the lower limit of $3 for 
the example given in the introduction finding. 


O11 . 


* > 309798) -8 °° 


implies a downward bias of at least 40 per cent of .05. Also, if do: > .07 then for 


2 The procedure followed is given in [6] on pp. 4, 5, equations (2.2) through Lemma 1. 





>) 





DOWNWARD BIAS 359 


any value of ¢1 the power of the analysis of variance test is reduced at least 40 
per cent. 


5. Conclusions. The rather sharp effects of a moderate lack of statistical 
control on the probabilities associated with the F-test indicates the importance 
of testing for statistical control outside of the industrial applications now made. 
Furthermore, it would seem advisable to investigate tests and designs that are 
less sensitive to the lack of control than is the F-test. 


REFERENCES 

(1] P. C. Tana, “The power function of the analysis of variance tests with tables and illus- 
trations of their use,’’ Statistical Research Memoirs, Vol. 2 (1938), pp. 126-157. 

(2) WILLIAM G. Mapow, ‘The distribution of quadratic forms in non-central normal ran- 
dom variables,”’ Annals of Mathem. Stat., Vol. 9 (1940), pp. 100-104. 

[3] W. G. Cocnran, ‘“The distribution of quadratic forms in a normal system, with applica- 
tions to the analysis of covariance,’’ Cambridge Phil. Soc. Proc. Vol. 30 (1934), 
pp. 178-191. 

[4] A. T. Craia, ‘‘Note on the independence of certain quadratic forms,’’ Annals of Math. 
Stat., Vol. 14 (1943), pp. 195-197. 

[5] S. KoLopzresczyx, ‘‘On the important class of statistical hypotheses,’’ Biometrika, 
Vol. 27 (1935), pp. 161-190. 

[6] J. F. Day, ‘‘On the unbiased character of likelihood-ratio tests for independence in 

normal systems,’”’ Annals of Math. Stat., Vol. 11 (1940), pp. 1-33. The proce- 

dure followed is given on pp. 4, 5, equations (2.2) through Lemma 1. 





MIXTURE OF DISTRIBUTIONS 


By HerBerRT ROBBINS 


Department of Mathematical Statistics, University of North Carolina 


1. Summary. Mixtures of measures or distributions occur frequently in the 
theory and applications of probability and statistics. In the simplest case it 
may, for example, be reasonable to assume that one is dealing with the mixture 
in given proportions of a finite number of normal populations with different 
means or variances. The mixture parameter may also be denumerably infinite, 
as in the theory of sums of a random number of random variables, or continuous, 
as in the compound Poisson distribution. 


The operation of Lebesgue-Stieltjes integration, | f(x) du, is linear with 


respect to both integrand f(x) and measure uw. The first type of linearity has as 
its continuous analog the theorem of Fubini on interchange of order of integra- 
tion; the second type of linearity has a corresponding continuous analog which 
is of importance whenever one deals with mixtures of measures or distributions, 
and which forms the subject of the present paper. Other treatments of the 
same subject have been given ((1], [2]; see also [3], [4]) but it is hoped that the 
discussion given here will be useful to the mathematical statistician. 

A general measure theoretic form of the fundamental theorem is given in 
Section 2, and in Section 3 the theorem is formulated in terms of finite dimen- 
sional spaces and distribution functions. The operation of convolution as an 
example of mixture is treated briefly in Section 4, while Section 5 is devoted to 
random sampling from a mixed population. 

We shall refer to Theory of the Integral by S. Saks (second edition, Warszawa, 
1937) as [S], and the Mathematical Methods of Statistics by H. Cramér (Prince- 
ton, 1946) as [C]. 

2. Mixture of measures in general. Let X(Y) be a space with points z(y) 
and let ¥(9)) be a o-field of subsets of X(Y). Let v be a measure on Y). Let 
py be for a. e. (v) y a measure on X, such that u,(S) isfor every S in ¥ a measurable 
(Y)) function of y. Define for every S in X, 


(1) u(S) = [ pe(S) dv. 


THEOREM 1. yisameasure on®. If v(Y) = u,(X) = 1, then u(X) = 1. 
Proor. Clear. 


TuroreM 2. If f(x) is any non-negative or non-positive function measurable 
(X) then the function 


(2) gv) = | $e) din 


360 





: 
i 
1 


— 


ae 






MIXTURE OF DISTRIBUTIONS 


is measurable (9)), and 


| [ s@ du = f ow) &. 
x Y 
Proor. First let fo(x) be any non-negative simple function [S, p. 7] of the 
form 
e 
t (4) fo(x) = {a, Si , os ¢ Gs, Sal 
; ! where the S; are disjoint sets in¥ such that X = >-} S; and thea; are non-negative 
| constants. Then 
7 . 
y (5) goly) = [ fo(x) duy = x a; 4, (S;) 
h is a non-negative function measurable (9)), and from (1) it follows that each side 
of (3) is equal to D0} aiu(S;). Hence the theorem holds in this case. 
. Next let f(z) be any non-negative function measurable (X); then [S, p. 14] 
; there exists a sequence f,(x) of simple functions such that for every 2, 
;, (6) 0O<filt) < flr) < --- ; lim f,(v) = f(z). 
e rs 
e Setting 
: () any) = | fale) day, oy) = ff $02) day, 
x x 
l- 
. it follows from the theorem of monotone convergence [S, p. 28] and from the 
. preceding paragraph that 
(8) [ #@ du = lim | fr(t) du = lim [ gn(y) dv, 
Ly x n+wvX n—ovy 
(9) aly) = tn [ SMa) doy = Yan only). 
n—-»wvX n+ 2 
} i 
: From (6) and (9) it follows that for a.e. (v)y, 
le (10) Os gly) Sg@ety) <--+;  limgn(y) = g(y). 
Hence g(y) is measurable ({)), and from the theorem of monotone convergence, 
(11) [ g(y) dv = lim / gn(y) dv. 
Y n--ovy 
Equation (3) now follows from (8) and (11). 
le By passing from f(x) to —f(x) we establish (3) when f(x) is any non-positive 


function measurable (¥). This completes the proof of Theorem 2. 
If f(x) is an arbitrary function measurable (¥) we define 


if f(x" > > . if < 0 
(12) f*(2) - ie if f(x) > r() = ae if f(x) < | 


(0 otherwise 


0 otherwise 














362 HERBERT ROBBINS 


so that 
(13) ° f(e) =f) +f) 


is the sum of two functions measurable (X) of constant sign. By Theorem 2 the 
functions 


(14) gly) = [ f(x) du, , gly) = [ f(x) dy, 


are measurable (%)) and 


(15) 0<[r@m=f[awws<«, 
x 7 
(16) 0 > | f(x) dp = / gly) dv > —o. 
x s 
The integral / f(x) dy exists if and only if at least one of the two quantities (15 
x 


THEOREM 3. A necessary and sufficient condition that 


) | 
and (16) is finite [S, p. 20]. 


(17) [ 7) du = [ if f(x) ‘ny dv 


is that at least one of the two quantities (15) and (16) be finite. 

Proor. By the remark preceding Theorem 3 the condition is clearly necessary. 
Now suppose, e.g., that (15) is finite; we must show that (17) holds. By hypoth. 
esis, 


(18) [ f(x) du < @, [ s@ du = [sf (x) du + [s (x) dy. 
From (18) and (15) it follows that 0 < gi(y) < © for a.e. (v)y; hence 
(19) [ 1@) dy — [r@ dpy + [r@ dpy sd gi(y) + gly) 
exists for a.e. (v)y. From the finiteness of (15) it follows that 

(20) [ Ga) + a0) & = [ oar + | oy) o 


exists. Hence from (19), the integral 


(21) [5 [© ay} de = | Ga) + ay) & 


exists. Equation (17) now follows from (21), (20), (15), and (18). This com- 
pletes the proof of Theorem 3. 

CoroLuary 1. Jf u(X) < ~, and if f(x) is bounded from above or from below, 
then both sides of (17) exist and the equality holds. 





2 the 


} (15) 


-Om- 


low, 


MIXTURE OF DISTRIBUTIONS 363 
Proor. If, say,f(x) < C < «, then 
0<[ f@dmusc-WX) <«, 
= 


and the result follows from Theorem 3. 

We shall now show by an example that the existence and even the finiteness of 
the right side of (17) does not imply the existence of the left side. 

Let X = Y = {1,2,---+ ,n,---} and let ¥(%) consist of all subsets of X(Y). 
Let v be the measure which assigns mass c, to n, where the c, are positive con- 
stants such that Bry cn = 1. Let u, assign the mass 1/2n to each of the points 


1,2, ---, 2n. Let f(x) be such that f(1) = bi, f(2) = —b,, f(3) = be, f(4) 
= —b.,+--+ where the b, are positive constants. Then 
/ f(x) dun _ 0 (n _ i, 2, P -+), 
x 
so that 


( | 
/ { / f(x) dun > dv = 0. 
Ye ) 
The measure » defined by (1) assigns to each n a positive value u(n) given by 
w(1) = (2) = eg: (2)? 4 cg (2-2)? + c5: (2-3) + -e- 
u(3) = w(4) = coe (22) + cg- (2-3)? + «-- 


I 


where u(X) = 2ou(n) = Den = 1. 
1 i 
Now fix the b, and c, in such a way that 


bicu(1) + be-u(3) + bgeu(5) + ees = &., 
Then 


[ Sx) dp = -[ 7@ dp = ©, 


so that the left side of (17) does not exist, even though »(Y) = u,(X) = w(X) = 
1 and the right side of (17) exists and is equal to zero. 


3. A restatement of the preceding results in the form most useful in prob- 
ability theory. Leta = (x1, ---,2,) bea point in the n-dimensional Euclidean 
space FR, , and let B, denote the o-field of Borel sets in R,. Let S, denote the 
half-open interval in F,, consisting of all points (w: , --- , wn) in R, satisfying the 
inequalities 
(22) Wi SM, °°* Wn S Xn; 
then if u is any probability measure on B, the function 


(23) F(x) = p(S,) 











364 HERBERT ROBBINS 


is the distribution function corresponding to wu. Conversely, if F(x) is any dis. 
tribution function in F, [C, p. 80] there is a unique probability measure pu on B 

n 
such that (25) holds. As a matter of notation we write for any Borel measurable 


f(x), 
(24) / die = | fix) dF (2) 


provided the integral on the left exists. 

Now let y = (yi, +++ , Ym) be a point in R,,, let G(y) be a distribution function, 
and let » denote the corresponding probability measure on B,,. Let F(z,y) 
be for a.e. (v)y a distribution function in x, and for every x a Borel measurable 
function of y, and let uw, be the corresponding probability measure on B,, . 

THEOREM 4+. The function 


(25) H(z) = | F(x, y) dG(y) 


is adistribution functionin R, . Let u denote the corresponding probability measure 
on B,. Then for any Sin B, , w,(S) ts a Borel measurable function of y and 


x 


(26) w(S) = | u(S) dG). 
Proor. Let C denote the class of all Borel sets S in R, such that u,(S) is a 
Borel measurable function of y. Weshall show that C is a normal class [S, p. 83]. 


(i) If S,, Se, +++ isa sequence of disjoint sets in C and if S = >-S,,, then 
1 


py (S) = iy (= S,) = X py(S,) 


is a convergent series of Borel measurable functions and is therefore itself a Borel 
measurable function. 


(ii) If S; D S; D--- isa decreasing sequence of sets in C and if S = IIS.., 
1 
then 


py(S) iy (II s,) = lim wy (S,) 


1 n—» 0 


« 


is the limit of a sequence of Borel measurable functions and is therefore a Borel 
measurable function. 

Hence C is a normal class. But C contains every interval S, , for u,(S:) = 
F(x, y) was assumed to be a Borel measurable function of y for every x. It 
follows [S, p. 85] that C = B,,. 

It now follows from Theorem 1 that the set function »(S) defined by (26) 
is a probability measure on B,. The corresponding distribution function is the 
function H(x) defined by (25). Thus Theorem + is proved. 





IS- 
n 


ile 


n, 


y) 
le 


hy 





MIXTURE OF DISTRIBUTIONS 365 


Let f(z) = f*(x) + f(z) be any Borel measurable function. Then from Theo- 
rem 2, the integrals 


[ : f* (x) dH(z) 


, f* (x) dz { [. F(z, y) act | 
(27) 2 2 
- fi [re d, F(x, y) \acw, 


[r@ae = [rw { [0 Few acw) } 
(28) 


= [ { [ f (x) dz F(z, y) } dG(y) 
exist. The following theorem is an immediate consequence of Theorem 3 and 


Corollary 1. 
THEOREM 5. A necessary and sufficient condition that 


(29) [ . f(x) ies [ j F(a, y) ic | a [ 4 [ ~ f(@) d F(x, )} aac) 


is that the left side of (29) exist; t.e. that at least one of the quantities (27) and (28) 
be finite. This will be true in particular if f(x) ts bounded from above or from below. 


4, The operation of convolution. An example of the general mixture (25) 
of distribution functions is the operation of convolution: if F(x), G(x) are two 
distribution functions in R, then F(z, y) = F(x — y) satisfies the conditions of 
Theorem 4, so that 


(30) H(x) = [ F(x — y) dG(y) 


is also a distribution function in R,; , denoted by 
(31) H(x) = F(x) * G(z). 

Corresponding to any distribution function F(x) in R, is the characteristic 
function 


(32) hia [ e? UF (x) 


which in turn uniquely determines F(x) [C, p. 93]. 
THEOREM 6. Let F(x), G(x), H(a) be distribution functions in R, and let ¢,(t), 
¢go(/), o(t) be the corresponding characteristic functions. Then 


(33) H(x) = F(x) * G(x) 
if and only if 
(34) g(t) = gilt)-¢2(t). 









HERBERT ROBBINS 


Proor. Assume (33) holds. Since |e*”| < 1 we have from Theorem 5, 


i ede} [ Fe ~ » ag } 


= [ffl are - 9 \aaw 
| pity [ eit @e-wv) d- F(x cm y) \ dG(y) 


\ 


g(t) 


(35) 


oo ( po 
= [oe [et ara} dat) = ed + ex 
= { Ses 
The converse implication now follows from the fact that the characteristic func- 
tion of a distribution determines the latter uniquely. 

The importance of the operation * in probability theory arises from the fact 
that if X, Y are independent random variables with respective distribution funce- 
tions F(x), G(x), and if Z = X + Y, then the distribution function H(z) of Z 
satisfies (33), since for any value of a, 


H(a) = PIX +Y <a] = | / dF(x) dG(y) 
(36) z+y sa 


= [. {f dF (x) \ dG(y) = [. F(a — y) dG(y) = F(a) « G@, 


the evaluation of the double integral by an iterated integral following from 
Fubini’s theorem [S, pp. 76-88]. However, (33) may hold without X, Y being 
independent, and Theorem 6 shows that (34) will then hold also, and con- 
versely. 

An example where H(x) = F(x) * G(x) without X, Y being independent 
has been given by Cramér [C, p. 317, exercise 2}. We shall give another. Let 
points 0, A, --- , F in the (2, y)-plane be defined as follows: 


O = (0,0), A = (1, 1), B = (1/2, 1), C = (0, 1/2), D = (1, 0), 
E = (1, 1/2), F = (1/2, 0). 


Let f(x, y) have the value 2 inside the quadrilateral OABC and the triangle DEF, 
and 0 elsewhere. Then if f(x, y) is the joint frequency function of XY, Y it is 
easily seen that Y and Y have uniform distributions on the intervals 0 < x < 1, 
0 < y < 1 respectively and that Z = X + Y has the triangular distribution 
given by (33), although X and Y are not independent. 

It would be interesting to know what distribution functions F(x) are such that 
if X, Y,Z = X + Y are random variables with the distribution functions F(z), 
F(x), F(x) * F(x) respectively, then X and Y are necessarily independent. A 
rather trivial example of such a distribution function is the step function F(x) 
with jumps of 3 at the points x = Oanda = 1. It can be shown (oral commu- 
nication by W. Hoeffding), in generalization of Cramér’s example, that no abso- 


MIXTURE OF DISTRIBUTIONS 367 


lutely continuous distribution function (e.g. the normal distribution function) 
has this property. 


5. The problem of random sampling from a mixed population. Let G(v) be 
a distribution function in the real variable v, and let F(u, v) be for a.e. (relative 
to the measure corresponding to G) v a distribution function in the real variable 
u, and for every u a Borel measurable function of v. Let 


(37) H(u) = | F(u, v) dG(v); 
then by Theorem 4 H(u) is ‘a distribution function in R,. Now define for 
t= (a1, oe » Za), ¥ — (y: , ee » Yn) 

H(x) = H(x) --+ H(an), 

Gly) = Gy) «++ Gyn). 


Both A(x) and G(y) are then distribution functions in R,. In particular, H(z) 
is the distribution function of a random sample of n independent variates each 
with the distribution function (37). Set 


(39) F(a, y) = F(a ? yi) _—s F(x, ’ Yn); 


(38) 


then for a. e. (relative to the measure corresponding to G) y, F(x, y) is a distribu- 
tion function in x, and for every x, F(x, y) is a Borel measurable function of y. 
By Fubini’s theorem we have 


A(z) = [| Ple,w) aGy) ++ + [ Fln, ys) AG) 
l — l F(x, yr) +++ F@n, Yn) IG(y) +++ dG(yn) 


[ Fe, nace. 


Thus A/(z) is itself a mixture in the sense of Theorem 4. It follows from Theorem 
5 that for any Borel measurable function f(z), 


oo o /¢ 2 \ 
(41) [ sat = [ sf tao, ae, 
if and only if the left side of (41) exists. When written out in full (41) becomes 


[Coe [see eee de { [Crew ae \ 


+ dry if F (xn, Yn) IG(yn) : ” . - [. if 


——S [ f(a — = 2 Ta) dz F(a, Yi) sis dq F(2n ’ vn) bag) on dG(yn)- 











368 HERBERT ROBBINS 


Equation (41) is of particular interest in connection with the distribution 
of a statistic t = t(a1, +++ , 2%) = t(x). For any distribution function J(z) let 
K(t | J) denote the distribution function of ¢ when x has the distribution function 
J(x). If we set 

1 if (x) < t, 

(43) f(z) = | 


| 0 otherwise, 


then 
(44) Ki |) = [ : f(x) d (2). 
Hence from (41), 


K(t| Ha) ++ Hea) = KG |) = [ KG1 FG, yw) ae 
(45) 


= r ia [ K¢ | F(a ? y:) oe F(xn ’ Yn)) dG(y) oe dG(yn). 


As an example, let ¢(x) be Student’s ratio 


(46) t = n’-z/s, 
let 

” 1 “ —h(y—v)?2 
(47) F(u, v) = Vix [ € dy, 
and let 


(Ofor v <—a, 
(48) G(v) = (for -a<v <a, 
\lfor as<uv. 


Then H(u) will be the distribution function of a mixture in equal proportions of 
two normal populations with unit variances and with means —a, a respectively, 
and K(t| H(x) --- H(xn)) will be the distribution function of ¢ in random 
samples of n from this non-normal population. On the other hand, K(é| F(a, 
yi) *** F (an, Yn)) will be the distribution function of ¢ in sampling from successive 
normal populations with unit variances and means yi, °-- , Yn respectively. 
Relation (45) now becomes 


(49) K(t | H (2) ree H (an)) = 2. K(t | F(x ’ Y1) rey F(2n ’ Yn)) /2” ’ 
ie 
where the summation is over all 2” sets (y1,--- , Yn), each y; being either —a 


or a. Due to the complexity of K(t| F(a, y:1) +--+ F(an, yn)) (the frequency 
function of which is discussed in a forthcoming paper by the author), relation 





y, 
m 


ve 
y. 


cy 
on 


MIXTURE OF DISTRIBUTIONS 369 


(49) is not very useful. In other cases (45) may afford a considerable simplifica- 
tion in the evaluation of the distribution function of a statistic obtained in 
random sampling from a mixed population. 


REFERENCES 


(1] W. Fewuer, ‘‘On the integro-differential equations of purely discontinuous Markoff 
processes,’’ Am. Math. Soc. Trans., Vol. 48 (1940), p. 488. 

(2] R. H. Cameron anp W. T. Martin, ‘‘An unsymmetric Fubini theorem,’? Am. Math. 
Soc. Bull., Vol. 47 (1941), p. 121. 

[3] P. R. Hatmos, ‘“The decomposition of measures,”” Duke Math. Jour., Vol. 8 (1941), 

. 386. 

[4] W. psi “On a general class of ‘contagious’ distributions,’’ Annals of Math. Stat., 

Vol. 14 (1943), p. 389. 









SOME APPLICATIONS OF THE MELLIN TRANSFORM IN STATISTICs 


By BENJAMIN EPSTEIN 
Coal Research Laboratory, Carnegie Institute of Technology 


1. Summary. It is well known that the Fourier transform is a powerful ana. 
lytical tool in studying the distribution of sums of independent random variables, 
In this paper it is pointed out that the Mellin transform is a natural analytical 
tool to use in studying the distribution of products and quotients of independent 
random variables. Formulae are given for determining the probability density 

















functions of the product and the quotient ‘. where é and 7 are independent posi 
n 
tive random variables with p.d.f.’s f(x) and g(y), in terms of the Mellin trans. 
forms F(s) = | f(x) x** dx and G(s) = | g(y)y” dy. An extension of the 
0 0 


transform technique to random variables which are not everywhere positive is 
given. A number of examples including Student’s ¢-distribution and Snedecor’s 
F-distribution are worked out by the technique of this paper. 


2. Introduction. It is well known [2], [3] that the Fourier transform is a 
useful analytical tool for studying the distribution of the sums of independent 
random variables. It is our purpose in this paper to study another transform 
which is useful in studying the distribution of the product of independent random 
variables. While it is perfectly true that one can reduce the study of the distribu- 
tion of the random variable € = £---- &, the product of n independent 
random variables & , &,--- , 2. , to the study of the distribution of the random 
variable n = log — = log & + log & + --- + log é,, the sum of nm independent 
random variables, it seems worth while to study the distribution problem directly. 
There are advantages inherent in the direct attack on the distribution problem 
which are lost to a considerable degree, if the problem is so transformed that the 
Fourier transform becomes applicable. In this paper we shall show that the 
direct application of the Mellin transform to the study of the distribution of 
products of independent random variables yields results of interest. 


3. Connection between Mellin transforms and products of independent 
random variables. The key reason for the importance of Fourier transforms in 
studying the distribution of sums of independent random variables depends on the 
following result: if £; and & are independent random variables with continuous 
probability density functions, (henceforth abbreviated as p.df.), fi(xz) and fo(2), 
respectively, then the p.d.f. f(x) of the random variable & = £ + & is expressible’ 
as 


(1) ja) = [fw — whla) dy = [fle — Mi) ay. 










1 In this paper we shall assume throughout that we are dealing with random variables 
with continuous p.d.f.’s. The argument can be extended with some changes to distribu- 
tion functions which are perfectly general, but for simplicity this will not be done here. 


370 


l ana. 
ables, 
Ytical 
ndent 
nsity 


; POsi- 
trans- 
of the 


lve is 
ecor’s 


Lisa 
ndent 
sform 
ndom 
tribu- 
ndent 
ndom 
ndent 
ectly. 
»blem 
it: the 
t the 
on of 


ndent 
ms in 
mn the 
uous. 
f (2) ’ 


sible’ 


APPLICATION OF MELLIN TRANSFORM 371 


But since these expressions are just the Fourier convolutions of f,(x) and f2(x), 
it is small wonder that the Fourier transform plays such a basic role in studying 
the distribution properties of sums of independent random variables. 

Consider now the following result for products of independent random variables 
(4), (5): if & is a random variable with continuous p.d_f. fi(x) and £ , independent 
of { , is a positive random variable with continuous p.d.f. fo(x), then the p.df. 
f(x) of the random variable = ££ is expressible? as 
@ see) = [15 (2) fo) ay 

oy \y 
But equation (2) is precisely in the form of a Mellin convolution of fi(x) and f(x) 
and therefore it may be expected that the Mellin transform should be useful in 
studying the distribution of products of independent random variables. 

It is useful to indicate briefly the properties of the Mellin transform. A de- 
tailed treatment of this transform will be found in [6] and we shall, therefore, 
stress only those portions of the theory of Mellin transforms which are of im- 
portance in the field of statistics. By definition, the Mellin transform F(s), 
corresponding to a function f(x) defined only* for x > 0, is 


(3) F(s) = I “ f(a)a* dex. 


Under certain restrictions on f(x) [6, p. 47], F(s) considered as a function of the 
complex variable s is a function of exponential type, analytic in a strip parallel 
to the imaginary axis. The width of the strip is governed by the order of 
magnitude of f(x) in the neighborhood of the origin and for large values of x and, 
in particular, the strip of analyticity becomes a half-plane if f(x) decays expo- 
nentially as x—> ©. There is a reciprocal formula enabling one to go from the 
transform F(s) to the function f(z). This transformation is: 


ct+t, 


(4) fa) = Af” a FG) as 


for all x where f(x) is continuous and where the path of integration is any line 
parallel to the imaginary axis and lying within the strip of analyticity of F(s). 


2 More generally [4, p. 411], if and £ are independent random variables with continuous 
p.d.f.’s fi(x) and f2(x), then the p.d.f. of the random variable é = ££ is expressible as: 


> x - 2 x 
"). x) = —fhl-)fly) dy = — - fh dy. 
- [. | a (5)3 &) dy [. ly| * (5) Lily) dy 


In [4] analogous results are given for random variables with perfectly general distribution 
functions. 

3 The reason for this restriction is that there are technical difficulties in defining a Mellin 
transform directly for a function defined over (—, ~). In [6], for instance, the Mellin 
transform theory is given for functions defined only for positive values of the argument. 
In statistical terminology this means that we are restricting ourselves for the moment to 
positive random variables. This is, of course, an unnatural restriction and we shall indi- 
cate later in the paper a simple device for treating such questions. 











372 BENJAMIN EPSTEIN 


If, in particular, we are interested in applying Mellin transforms to p.df’s 
of positive’ random variables, the analysis can be carried out rigorously. Also, 
as in the case of the Fourier transform, one has the desirable property that there 
is a one-one correspondence between p.d.f.’s and their transforms. 

A number of common p.d.f.’s of positive random variables have simple Mellin 
transforms. For example see Table 1. 

In terms familiar to the mathematical statistician, the Mellin transform of a 
positive random variable ¢ with continuous p.d-f. f(x) is E(#**), where 


(5) F@) = Ee) = | " Pf(2) de, 


The following three basic properties hold: (7) The positive random variable 
n = a£,a > O has the Mellin transform G(s) = a*" F(s). This is immediate 
since 


(6) G(s) = E(n’") = E(a"" &") = a" F(s). 
(ii) The positive random variable 7 = &* has the Mellin transform G(s) = 
F(as —a+1). Toprove this we note that 
(7) G(s) = E(n*") = E(t*"*) = Flas — a + 1). 
In particular ifa = —l,ie,7 = , then 
G(s) = F(-—s + 2). 
This is a result which we shall have occasion to use later in the paper. 
(iii) If & and & are independent positive random variables with Mellin transforms 


F,(s) and F2(s), respectively, then the Mellin transform of the product 7 = 
££ is G(s) = Fi(s) F2(s). This is immediate since 


(8) G(s) = E(n™") = El(f&)""] = E(ér’) E(e”) 
= F\(s) F.(s). 
More generally if & , &,--- , &, are independent positive random variables with 


Mellin transforms F'1(s), F2(s),--- , Fa(s), then the Mellin transform of the 
random variable 7 = ££ --- & is G(s) = F,(s) Fo(s) --+ Fn(s). This relation- 
ship is fundamental and justifies the introduction of Mellin transforms in 
studying products of independent random variables. 

From (8) it is clear that we can find the p.d.f. g(y) of the random variable 
n Which is the product of two positive independent random variables & and & 
with continuous p.d.f.’s fi(a) and fe(x). In fact, by the Mellin inversion formula 


1 ct+i,o - 1 ct+t, oe 
 g@=5f  va@a= if x POFG)as, 
TU Jc—i,o 272 c—i,~o 


4 See footnote 3. 








a) 
| 
wD 


APPLICATION OF MELLIN TRANSFORM 


7» —g > (s) oY > P— ‘dig 


n— < (8s) oy ‘ouvjd-jjey 


v— < (8s) oy ‘ouvld-jpeyy 


0 < (s) oy ‘ousld-jjeyy 


wuojsuesy, Jo ApOIVA[VUY JO uosIYy 


(I? — sal + > 


(¢ —» — s)A(s +”) 


(T+ )1(I+s +9 +”) 


(8+ )IG+dt+r)d 


(I+ ”)1 
(8 + »)I 


wWIOjsuBIT, UlT]IW 


= (8)4 


(8) 


= (8) 





T W1avVo 


o>zt>0 


I>7>0 


1<*-g 'I-<”? 
+1 (1-9 — OAT +?) 


= (x)f 


vv (9)1 


I- <9 I- <® 
aay Masa ‘9 = 


(+ sa(t + 


. = (z/f 
Z@t+9t+”)l @ 


“g(%¥ — I)or 


I>. <9 


‘ (I +— »)1 


oc Zz 
Fee (x)f 


Il 


dloyMos]o ‘(0 = 


Is7tso0 T=(@/ 


ith 


the 
yn- 

in 
ble 


ms 





(P) 


(q) 


& 
ula 








374 BENJAMIN EPSTEIN 


where the path of integration is any line parallel to the imaginary axis and lying 
within the strip of analyticity of G(s). As in the case of characteristic functions, 
it can be shown that there is a one-one correspondence between p.d.f.’s and their 
Mellin transforms. Therefore, it follows that the p.d.f. g(y) computed in this 
way must be precisely equal to 


ay ow =f ba (Y) ner ar = [Ln (%) ie a. 


It is easy to verify this directly by showing that the Mellin transform of the 
right-hand side of (10) is F1(s) F2(s) [6, p. 52], but this will not be done here. 
The essential point is that Equation (9), (which is sometimes easier to evaluate 
than Equation (10)), is a consequence of an algebraic formalism which is 
capable of revealing relationships which would otherwise remain hidden. 


The p.d.f. h(y) of 7 = 1 the ratio of two positive random variables with 


continuous p.d.f.’s, can be _— to finding the p.d.f. of the product of inde- 
pendent random variables £; and . If Fi(s) and F2(s) are the Mellin transform 
corresponding to & and & , respectively, then by (ii) F2(—s + 2) is the Mellin 
transform of f and, therefore, the Mellin transform H(s) of 7 = : is F,(s) F; 
(—s + 2). Therefore, the p.d.f. h(y) of 7 is 


1 ct+t,0o - 1 ct+t,e als 
(11) h(y) = a [. y °H(s) ds = sil... y °F y(s)Fe(—s + 2) ds. 


This formula is useful in finding distributions such as Student’s ¢ and Fisher’s z. 


4. A modified Mellin transform procedure for finding the distribution of the 
product of independent random variables which are not everywhere positive. 
Up to this point we have limited ourselves to the application of the Mellin 
transform to finding the distribution of the product or ratio of two positive 
independent random variables. While it is true that a number of interesting 
probability density functions are defined only for positive’ values of the argument, 
it is certainly desirable that we be able to treat situations involving random vari- 
ables capable of taking on both positive and negative values. A simple device 
for extending the Mellin transform treatment to the more general problem is to 
decompose the p.d.f.’s fi(x) and fe(x) of the independent random variables 
£, and & into 


fi(x) = fu(z) + f(r), 
fo(x) = fax) + foo(x), 


5 For example, distributions of type 3, the x? distribution, the distribution of the sample 
standard deviation and sample variance, the distribution of an even power of arandom vari- 
able, ete. are all defined only for positive values of the argument. 





‘we VS WO YY @ 


mao wo 


APPLICATION OF MELLIN TRANSFORM 375 


6 
where 


fu(@z) = 0,2 <0, f(x) = 0,2 > 0, 
fu(x) = 0,x < 0, fo(x) = 0, x > 0, 


and then to operate on the pairs [fu(x), for(x)], [fiur(x), foo(x)], [fie(x), for(x)], and 
[fi2(v), foo(x)] separately. More specifically, the frequency distribution h(y) 
corresponding to the random variable n = ££ is made up of the sum of four 
components hi(y), he(y), As(y), and ha(y). To compute hi(y) one can apply 
the Mellin transform directly to the evaluation of the expression 


hi(y) = [ * fu @ far (x) dx, 


since both fi:(x) and fo:(x) are zero for negative values of x. The function h,(y) 
is zero for y < 0. To compute h.(y) we first evaluate 


nw = [bi (“) fal-~s) de. 


Again f(x) and fo2(—2) are zero for negative values of x and, therefore, the con- 
ventional Mellin transform can be applied in determining hf,*(y). It is clear that 
ho*(y) = Ofor y < 0 and, therefore, he(y) = he*(—y) = Ofory > 0. Similarly, 
one can find h3(y) and ha(y) where h3(y) = 0 for y > 0 and h4(y) = O for y < 0, 
and it is readily seen that? 


h(y) = ha(y) + holy) + hay) + haly) 
is the desired p.d.f. of n = ff. 


5. Examples of use of Mellin transforms in evaluating the product and 
quotient of independent random variables. Example 1: The distribution of 
n = kite, where & and & are independent random variables with p.d.f.’s fi(x) 
and fo(x), respectively, where 





fila) = fo(x) = wr cm, —-o <2 < ~, 


In this case 


fiz) = fu(x) + fu(a), 


and 


fo(x) = for(x) + foo(z), 


2 0 
6 Of course, fir , fiz , for , and f22 are generally not p.d.f.’s since | fu(x) dz, [ fie (x) dz, 
0 I— oO 


~ 0 
| fa(x) dz, [ feo(x) dx are no longer necessarily equal to one. 
0 is 


© 


7 As in footnote 6, hi , he , hs , and hy are, in general, not p.d.f.’s. 











376 BENJAMIN EPSTEIN 


where 
fu(z) = 0, s < 0; fie(x) = 0, eS > 0; 
fa(x) = 0,2 < 0; fo(x) = 0,2 > 0. 


The random variable 7 = ££ has a p.d.f. h(y) = Mai(y) + he(y) + ha(y) + ha(y) 
where 


hi(y) is associated with [fu(x), for(x)], 

he(y) is associated with [fi(x), foo(x)], 

h3(y) is associated with [fie(x), foi(x)], 
and ha(y) is associated with [fi2(x), foo(x)]. 
It is sufficient to evaluate 


hity) = l 7 = fu @ fa(x) de. 


_ fo 1, (y 
= l im (") fu(x) dz. 
3(s—3) 


@ @ 1 
Fy,(s) = I a fiu(x) dx = [ a Van e 7 dy = =~ I(e/2), 
Tv 


In this case 


analytic for Re(s) > 0 


and 
o - gh (s—3) 
F2(s) = | z for(x) dx = ——= T(s/2). 
0 V4 
Therefore, 
s—3 
H,(s) = Fy,(s)F21(s) _ = T’(s/2) 
1 ct+i,o 
in(y) = 5 I VHA) ds 
c+t,0o s—3 
wie -+ 2" 7°(s/2) de, c>0 
2Qr1 c—t,00 T 
a i 
= — Koy), y>O0_ [6, p. 197] 
2r 


where Ko(y) is Bessel’s function of the second kind with a purely imaginary argu- 
ment of zero order. Similarly 


lad a0 Rida. y <0 
Qr 
So 

hs(y) = = Koly), y <0 
2r 


= 
hay) = - Ko(y), y > 0. 
T 





y) 


0 


APPLICATION OF MELLIN TRANSFORM 377 
Therefore, h(y) = Mi(y) + he(y) + haly) + haly) 
ae 
= — Koy), =e 49 < @, 

and this is the desired p.d.f. This result has been found by other methods and 
is given in (1, p. 1}. 

Example 2: The distribution of » = = where & and & are independent random 
variables with p.d.f.’s fi(x) and fo(x), respectively, where 


1 —zx2/2 


fila) - fo(zx) = \/ On e - —o <y< o, 


As in Example 1, one splits the determination of h(y), the p.d.f. of n, into four 
parts: hi(y), he(y), ha(y), ha(y). In the notation of Example 1 it is easy to show 
that Hu(s) the Mellin transform of hi(y) is 


Di(s—3) Di(s—3) 1 
Fy,(s)Fa(—s + 2) = wae T'(s/2) “oe Ir(—s/2+ 1) = 4 











sin 
2 
1 ct+i,eo 
hi(y) = os  y °H(s) ds, 0<c < 2, 
» i 1 y* ds 
= 2Qri c—i,o 4 sin 81 
. 2 
1 1 
ae cs = 
ril+y’ y20 
Similarly 
1 1 
12 = ee < 
hoy) ox 1 + y? ’ y _— 0, 
1 1 
ee sa i 
hs(y) an T+’ y <9, 
hy) = 5 ip,’ y=0 
Therefore, h(y) = hi(y) + he(y) + ha(y) + haly) 
1 1 
“ity? —<— 


This result has been found by other methods and given in [4, p. 411]. 
Example 3: F-Distribution. Let &, +--+ ,&m,m,°** 5 Mn be (m + n) independ- 








378 BENJAMIN EPSTEIN 


ent random variables, each normally distributed with mean zero and standard 
deviation ¢. Let 


f= 28, n 


We want to find the p.d.f. h(z) of ¢ where ¢ 
of ~ and 7», respectively, are: 


f(x) = 


n 

2 
2. ti. 
j=1 


t/n. The p.d.f.’s f(x) and g(y) 


gnl2-t e 7i2e* 


Qml2gmT'(m/2) ’ 








x> 0, 


and 


n/2—1 e vite? 


Y 


Y ae RRR 
gy), Qnl2 gn T'(n/2) > y > 0. 


In this case 


9°71 or (s + 5 _— 1) 


: m 
F(s) = Fina analytic for Re (s) > 1 5 
and 
ge 4p (s +5 aad 1) 
7 ; n 
G(s) = a 5 analytic for Re (s) > 1 — 5° 


The p.d.f. h(z) has Mellin transform 
H(s) = F(s) G(—s + 2) 


™m n 





I'(m/2)T (n/2) 
Therefore, 
hd = ] - >" H(2) dz _ m es ted n 44 
ia 2 2 ' 


m+n ° 
Tr a m|[2— 
(SEP) aes 


z2>0, 


- FROG) FET 





A convenient way of carrying out the inversion is to use formula (d) in Table 1. 
In a similar way one can find Student’s distribution, i.e., the distribution of 





/ 
n 
2 ° 
¢ = &/n, where 7 = > &,/n , and where  , &, -** , 2 aren + 1 independ- 
1 
ent random variables each having the distribution: 


—z2 /2¢2 


1 
fla) = Taz ° ’ —x“ <xr< ®, 





~ ec = = 


APPLICATION OF MELLIN TRANSFORM 379 


It should be mentioned in conclusion that the Mellin transform is a natural 
tool to use in situations involving the products and quotients of independent 
uniformly distributed random variables, or in finding products and/or quotients 
and/or Beta-distribution. In such cases formulae (b), (c) and (d) in Table 1 
are useful. 


REFERENCES 

[i] C. C. Crate, “On the frequency function of zy,’? Annals of Math. Stat., Vol. 7 (1936), 
pp. 1-15. 

[2] H. Cramir, Random Variables and Probability Distributions, Cambridge Tracts in 

Mathematics, No. 36, Cambridge, 1937. 

[3] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[4] J. H. Curtiss, “On the distribution of the quotient of two chance variables,’ Annals 
of Math. Stat., Vol. 12 (1941), pp. 409-421. 

[5] E. V. Huntineron, ‘Frequency distribution of product and sum,” Annals of Math. 
Stat., Vol. 10 (1939), pp. 195-198. 


(6] E. C. Trrcumarsn, Introduction to the Theory of Fourier Integrals, Clarendon Press, 
Oxford, 1937. 











THE ESTIMATION OF LINEAR TRENDS 


By G. W. Housner anv J. F. BRENNAN 


California Institute of Technology 


1. Summary. This paper deals with the problem of bivariate regression 
where both variates are random variables having a finite number of means dis- 
tributed along a straight line. A regression statistic is derived which is inde- 
pendent of change in scale so that a prior knowledge of the frequency distribution 
parameters is not required in order to obtain a unique estimate. The statistic 
is shown to be consistent. The efficiency of the estimate is discussed and its 
asymptotic distribution is derived for the case when the random variables 
are normally distributed. A numerical example is presented which compares 
the performance of the statistic of this paper with that of other commonly used 
statistics. In the example it is found that the method of estimation proposed 
in this paper is more efficient. 


2. Introduction. A problem that often arises in statistical work is the estima- 
tion of linear trends. In the general problem it is known or presumed that a 
linear fanctional relation exists among a set of variables of the form, 


a + bX + boY a bZ ++? ae 0. 
The observed values of the variables are of the form 
Liz = Xi t ex, yx = Y;+ nex, ete. 


That is, the x; are random variables with means X; and k = 1, 2, --- N; observed 
values of x are associated with the mean X;. The ordering of the X; is according 
to magnitude. Similarly there are the observed values yx , 2, and so forth. 
The ex are random variables, with the same distribution for all 7, with zero 
means. On the basis of a sample On(v% , yx, Zi, °° ) it is desired to estimate 
the coefficients a, bi , be, bs, --- . One method used to estimate the coefficients 
is that of ‘weighted regression”’ which is essentially an application of themethod 
of least squares. The problem has been studied by R. Allen, A. Wald and 
others.’ The chief difficulty has been that the proposed methods of estimation 
require an a priori knowledge of the variances of the random variables. Wald 
has proposed a statistic which avoids this difficulty but which may have a rela- 
tively low efficiency in cases often encountered in practice. In this paper there 
is described a bivariate statistic which appears to have comparatively high pre- 
cision and which does not require prior knowledge of the variances of the random 
variables. A numerical example is given at the end of the paper to illustrate the 
comparative performances of different methods of estimation. 





1 For a brief history of work done on this problem see the paper by A. Wald in the Annals 
of Math. Stat., Vol. 11 (1940), p. 284. 


380 





oo 7 0 


Se RF eR ee DW 





LINEAR TRENDS 381 


3. The Regression statistic. In the case of the bivariate problem, consider a 

sample 
O.(Zix , Ya), t = 1,2,°°- 0 
and 
i-th +++. 

where NV; sample values x;, y; are distributed about mean X;, Y;. Let the 
means be related by Y; = a+ bX; and let the random variables x; be independent 
and have the same frequency distribution with variance o; for all 7 and the ran- 
dom variables y; have independent frequency distributions with variance o, 
thesameforallz. An appropriate statistic for estimating b is obtained by noting 
that a pair of sample points (vx, yx), (vj, yi) gives a sample value of the 
change in y corresponding to a change inz. It may thus be said that a sample 
value of 6 is 


Yak — Yar 


bij = 
(1) oe vik — Lj 
Making use of the fact that 
(2) Yr = a+ brn + nx — Dex 
equation (1) may be written 
(xin — 32) bie yj = (vin — tj) b + (mix — nj) — ble — €)2). 

Summing this equation over all combinations of points there is obtained 

LULL (yix — yin) hay ((nix — nur) — ble — €52) 
(3) Se ‘ 

" (rx - - - xj) DLLd (tin — 2) 

$s j b 

The summations in the above expression are to be carried out for 


l=1,2,--- ,Nj;k= 1,2,°**,Ne3j = Li ---,¢=- I;t= 1, 2,- 





The first term on the right side of equation (3) is an estimate of b and the second 
term represents the deviation of the estimate from the true value. Accordingly, 
we take as an estimate of 6 the statistic 


LLUy (yix — yn) 
" SESE Geom 


This requires, of course, that the denominator be not equal to zero. Summing 
out the subscripts k and | reduces (4) to 


(4) 


YU MMG—9 i) 


a = 





Na; — ij) 








382 G. W. HOUSNER AND J. F. BRENNAN 


where jj; is the mean value of the yi and so forth. Summing out the subscript 
J gives 


> 


i—1 i—1 
pS (, Yi X N; = N; X N; ws) 
(5) oe Sg + og : 
= (ves. > N; — N; ON; is) 


a 1 


This expression may be put in a more convenient form by using the identity 
n i | n n n n i 
(ME wai) = & (Mea Em) = B (Mew (HV - EN), 
i=l 1 t=1 i+l t=1 1 1 

With this substitution equation (5) becomes 


2 | Ned bs N; — 2 N; -b vs) 
=1 1 1 


i= 


oo 


(6) neoaal 


Zz. | Ne (> N; — 22,N; + vs) 

t=1 1 1 

This is the statistic for estimating the linear trend of bivariate data. It may be 
noted that its derivation is not based on the notion of fitting a line to the sample 
points. Aline y = 4+ bz may be fitted to the sample points by making it pass 
through the mean of the sample points, that is, by using the following estimate: 
a=j- bz 


where 7 and Z are the means of all the y and x respectively. 

4. Consistency of the estimate. Having established the statistics b and 4 it 
is desirable to examine the consistency and efficiency of the estimates, particu- 
larly for b. To determine that 6 is a consistent estimate we investigate the 
behavior of (6) as the number of sample points increases, that is, as the N; > ». 


We wish first to establish the following identity. Consider the sum of the 
following array of terms: 


Ni(Mi1 + No + °** + Nn) 
No(Ni + No + +++ + No) 
NAAM te Ne + 7 + N,) 
The sum may be written >, N; >) N;. Since the array is skew symmetrical the 
1 1 


n i 
expression 2 >, V; >, N; also gives the sum of the array except for the fact that 
1 1 


the terms along the principal diagonal are counted twice. We have, therefore 


X N; X N; =2 » N; » N;- X Ni. 


eR 





be 


dle 


iO. 


he 


at 


re 


LINEAR TRENDS 383 


Rearranging terms we obtain the identity 
Now substituting (2) into (6) and making use of (7) there is obtained, 
. w. (= N;-—2 : N;+ Ns) (ni — ba) | 


(8) b=b+- 





a Ee (= N;~-2 » N; + Ns) x| 

The 4; and é; are random variables with zero means so that as N; — « the sample 
means 4; and é; converge in probability to zero. As N; — ©, &; converges in 
probability to its mean X;. In view of (7) and that the denominator in (8) 
isnot equal to zero the last term in (8) converges in probability to zero and b-—>b. 
The estimate is therefore consistant. A similar argument also shows the estimate 
d to be consistent. 


5. Efficiency of the estimate. A general investigation of the efficiency of the 
estimate b is beyond the scope of this paper. We may note, however, that the 
efficiency of the estimate can be made to depend upon the grouping of the data, 
that is, the optimum efficiency of the estimate may depend upon the omission 
of some of the pairs (yu — yj.) from the estimate. The maximum efficiency is 
obtained for 6 when the second term in (3) is minimized. This requires prior 
knowledge of the frequency distribution of the random variables x and y; how- 
ever, in applications a recognition of (3) may often indicate a practical method 
of increasing the efficiency. 

In what follows we make an investigation of the precision of the estimate 6 
for a special case which is of some practical interest. Let x and y be random 
variables as defined in the first part of the paper and consider the new variables 


defined by § = ~ that is, 


~ 
~ 


Bs : Baoan — 20, + Ni) %| 


v= Dw (Sy; _ 2D, + Nias]. 


The random variables u and v are then independently distributed with joint 
probability element f(u) f(v) dudv. Making the change of variable u = r cos @, 
v = rsin 6 the probability element becomes f(r, 0)drd@ wheretan @= wu. Integrat- 
ing out the variable r gives the probability element for 6. In what follows we 
investigate the distribution of @ for the case where x and y are normally dis- 
tributed with the same variance. Since u and »v are linear functions of x and y 
respectively they are also normally distributed with the same standard deviation. 











384 G. W. HOUSNER AND J. F. BRENNAN 


We designate the means of u and v by m, and m: respectively and the standard 
deviation by ¢. The probability element in wu and v is then 


] 1 ‘ ° 
(9) dn52 EXP = 5,2 [(u — m)? + (v —m;)} du dv. 


Changing variables to r, @ and setting m: = 7 cos 6, m2: = 7 sin 6 we obtain the 
following probability element: 


. ( 1 sea : — 
a. _ exp) — — [(r cos 6 — 7 cos 6)" + (r sin 6 — Ff sin 6)]> dr dé. 
210” | 20? 


Completing the square in r and substituting ¢ = 6 — @ there is obtained 


r f 1 : ' 7 sin ¢\"\ 

1 ——. exp { — - S xp —2( —— ; 
(10) Snot XP \ 503 (r — 7 cos ¢) f exp —3 ( ) f dr do 
To integrate out r make further change of variable 

r 7 
t = - — — Cos @. 
o o 


pe ‘ . . : : 
Setting - cos ¢ = w for convenience in notation there is obtained 
o 


1 } al Ww f t° ) - j 
—— ¢ ox —~)>+4+- ) wai } wash sas ld. 
( exp \ 2 | 9 exp \ 5} exp 2 2 Ww dt do 


The variable ¢ is to be integrated out of this expression. The corresponding 
limits of integration are exhibited by 


—¥— exp — 1(E — wt) [Lh exp {2h a 
Jan ae YD \ Vee Le peed 2{ 


... +00 £2 
+ \/ an [. exp { | it) do. 


Now as the number of points in the original sample increases the value of 7 


(12) 


. o . Tv . 
also increases and as _— 0, with |¢| < 9? the value of w — . In this case 


then (12) approaches asymptotically to 


Poe oe sin ¢ \ r 
\/ 2x0 exp { 9 (Sf) d(sin ¢d). 


Aso/7 — 0 this distribution shows that ¢ converges in probability to zero and 
that the distribution approaches asymptotically to the normal form 


1 2 
(13) \/2nc/F exp {- , (3,) \ dd. 


It is required then to examine the conditions under which o/7 assumes small 
. oa . . . 3 
values. If the variance of the original variables x; and y; is designated by o} 








rd 


"7 


ll 


1 


LINEAR TRENDS 385 


then since u and »v are linear functions of x; and y; respectively the variance of u 
and of v is 


0 enabifn(be abs x) QQ} 


Now # is the sum of the squares of the means of u and v so that 

n n i 2 
(15) FP =(1+0’) p> Ei e N; — 2 X N; + Ns) x. p 
Dividing (14) by (15) we obtain 


(16) (2) a1 E{[v(Sm-22m+m)] (yp 


sad 1: i. 2. n n t 2 
— {z E e N; — 235; + Ni) x. |} 
1 1 


3g 


: 
Inspection of (16) indicates that as the number of sample points N; increases the 
2 
value of decreases rapidly. To illustrate this we examine some particular 


eases. Consider first the case of four equally spaced means X; = 310, 
(i = 1, 2, 3, 4) and let there be one sample point for each mean (N; = 1). 
With these values there is obtained, 


(:) _ 0.022 
FJ) 1+ 07 


For b = 1 the range —9° < @ < +9° includes 95% of the population defined by 
(13). As the number of points N; is increased or as the number of means X; 


9 


is increased the value of () decreases rapidly. Consider now eight equally 


spaced means X; = 3,01, (¢ = 1,2, --- , 8) with again one sample point for each 
mean (V; = 1). With these values there is obtained 


(j _ 0.00045 
7 ee oe 


For b = 1 the range —1° < @ < +1° includes 95% of the population defined 
by (13). 

It is clear that a very high degree of precision is obtained with the estimate 6 
when there is a considerable number of sample points. However, this will also 
be true in general of other statistics and it is really of interest to compare pre- 
cisions in those cases where the statistics have a relatively low precision. A 
detailed comparison is beyond the scope of this paper. However, a direct com- 
parison can be made very easily in the particular case when 7; is a fixed variate 











386 G. W. HOUSNER AND J. F. BRENNAN 


and only y; is a random variable. For the sake of simplicity, let each N; = 
then the statistic for estimating b is 


_ Liw-a Lug-o 
(17) 5 = = 


> ix; - 2) Sag — i) 
1 1 
Since 6 is a linear function of the y; by a well known theorem its variance is 


d= >> elect E 
8) , 7 a(t — 7) 


The customary least squares regression line of y on x gives for the estimate of } 
and its variance 





Vi—- x 


> yi(xi — Z) = 
= ij > (= 
>> 2(x; — 2) ” ; » u(x; +z) 


o~ 
Q 

or ro 
I 
Q 

© 


In the particular case when the x; are equally spaced, x; = ci + d, the estimates 
b and bp are identical: 


(19) b = bp = 5 yi(t — 7). 


eat - 


6. Numerical example. From a practical point of view the case where x and 
y are random variables is of greater interest than where z is a fixed variate. We 
give a numerical example of this case comparing the statistic 6 with several other 
statistics. Consider the case where there is one sample point for each mean X;. 
We shall evaluate the following: 

i). The statistic of this paper which for this case is 


- > yi — 7) 
b; = a 
> ali — i) 


2). The statistic obtained by minimizing the sum of the squares of the y 
deviations only 


> yi(x; — Z) 
= a(x; — 2) | 


o~ 
nw 





LINEAR TRENDS 387 


1 3). The statistic obtained by minimizing the sum of the squares of the orthog- 
onal deviations 


yy - 9 - > (a — a) 
n n n 2713 
+ ln Sw - ond wat a(x — a) | 

















bs = n . 
Ly — De - 4) 
TABLE I 
St [| ma | wm v2 Y2 Is Ys | Ys 
erated ee cee aca ce etn eee rin 
. 1 | 1a | 14 2.4 2.0 3.0 2.7 | 3.6 4.3 
2 | 12 | 14 2.2 2.0 3.4 3.1 | 3.8 4.2 
3 | 10 | 1.4 1.6 2.1 2.8 3.2 | 4.4 4.3 
4 | 06 | 0.7 | 18 | 2.0 3.3 | 2.6 | 38 | 4.0 
5 | O7 | 14 | 17 7 | 27 | 34 | 41 | 44 
6 | 10 | 1.2 1.6 2.1 2.9 | 2.6 | 3.6 4.0 
7 | 13 | 0.7 1.7 2.1 2.7 2.9 | 4.0 3.6 
- TABLE II 
Set | ob |b | bs bs 
1 | 1.160 1.068 | 1.220 1.162 
2 | 1.056 | 1.009 | 1.059 1.027 
3 0.860 | 0.843 0.803 0.870 
d 4 0.946 | 0.896 | 0.924 0.830 
fe 5 0.875 | 0.867 | 0.913 1.000 
er 6 0.978 0.939 | 0.981 0.846 
7 1.044 | 0.959 | 1.045 1.000 
ots inccssaeannainitaniiiincl 0.990 | 0.940 | 0.996 | 0.962 
Tu tegeVee.................. 0.0686 0.0373 0.1058 | 0.0834 
4). The statistic proposed by Wald” 
n/2 n 
y 4 nfs s ; 
» i Ze vi 
1 n/2 


We apply these statistics to sample data having four means X; = i and Y; = 
2, (4 = 1,2,3,4). By means ofa table of random numbers seven sets of data were 


? Loc. cit. 








388 G. W. HOUSNER AND J. F. BRENNAN 


obtained, each set having one sample point corresponding toeach mean. These 
sample points are described by Table I where it will be noted that the sample 
points were drawn from a discrete distribution. The estimates obtained from 
the four statistics are exhibited in Table II. 

If the 28 sample points are treated as a single set of data and the four statistics 
in their appropriate forms are applied, there is obtained the following set of esti- 
mates: 

b, bo bs bs 
0.9768 0.9183 0.9786 0.9496" 

The preceding computations show that the estimate bs is inferior to the other 
estimates, as would be expected. The estimate b; is most accurate when the 28 
sample points are treated as a single set of data with the estimate by being only 
very slightly less accurate, 6, = 0.9768 as compared to 6; = 0.9786. When the 
individual sets of sample points 1 to 7 are considered it is seen that the estimate 
b; is most accurate with the estimate b; rather less accurate; the estimate 5, is 
more precise than 6;, the sample variances being in the ratio 0.0686 + 0.1058 = 
0.65. From a practical viewpoint we may also point out that the computation 
of 6; requires very much less labor than the computation of bs . 











ON THE EFFECT OF DECIMAL CORRECTIONS ON ERRORS OF 
OBSERVATION 


By Puitie HARTMAN AND AUREL WINTNER 


The Johns Hopkins University 


1. Summary. Let ¢ be the true value of what is being measured and suppose 
that the error of observation is a symmetric normal distribution of standard 
deviation ¢. The “rounding-off” error due to the reading of measurements to 
the nearest unit has a distribution and an expected value depending on ¢ and o. 
It is shown that, for a fixed « > 0, the expected value of the decimal correction, 
r(t; ¢), is an analytic function of ¢ which is odd, of period 1, positive for 0 < t < 3, 
and has a convex arch as its graph on 0 S ¢ S 4. Furthermore, if 0 < t < 3, 
both r(t; ) and its maximum value, Max r(t; 0), are decreasing functions of ¢. 

t 


2. Introduction. Let X be an error of observation and let ¢(x) denote the 
density of probability of the distribution of X. In particular, 


+00 
(1) | ¢(x) dx = 1, where ¢(x) = 0. 


If ¢ is any fixed number, the density of probability of the distribution of 
X+ tis o(x — 2). 

Besides the “‘instrumental error of observation’’, X, there is another error, that 
of the ‘‘rounding-off”’, which is carried along in the registration of the measure- 
ments. It is introduced by the circumstance that, if --- , b, a are digits, and if 
b denotes the last digit considered, then decimal fractions such as --- ba and 

- ba--- are registered as ---bifa < 5andas::- (b+ 1)ifa> 5. Let 
the unit, in which the measurements are expressed, be so chosen that the first 
digit neglected becomes the first digit following the decimal point, i.e., that the 
error of the ‘‘rounding-off” is between +3. Then, if ¢ denotes the true value of 
what is being measured, the remark made after (1) shows that the probability that 
the error of the decimal corrections be less than x is given by 


Z | o(u — t) du, 
nN =— oO r—4 


qT 


if| «| < 3, whereas this probability is 0 or 1 according asx < — 3 or x > 3. 


Since the last series can be written in the form 
(2) PR | d(iu+tn—tdu= | > diutn-thdu,y (¢ 209), 
N=—o J—4 a n=—o 


it follows that the density of probability of the error due to the decimal correc- 
tions is 


(3) > oa@+n—d if |x| <4, and0if |x| > }. 


nN =—O 


389 








390 PHILIP HARTMAN AND AUREL WINTNER 


Consequently, if r = r(t) denotes the expected value of the decimal error induced 
on the “‘true” value, t, of the observations, then 


(4) r(t) = / zt >, o(x+n— 2) de. 

z|<4+ N=—co 
Formula (4) is known’. It is usually based on its intuitive interpretation which 
results if, on the one hand, (4) is written in the form 


(5) r(t) = [ _s(z)o(e — dr, 
where 
(6) s(x) = xif —3 < 2 < $ and s(x) = s(x + 1), —2x <2r< ao, 


and, on the other hand, the periodic function (6) is thought of as representing the 
uniform distribution of the error of “rounding-off” over the arithmetical continuum 
over a period, 


jz —n| <3, (n = 0, +1, --+), 


onthe a-axis. Needless to say, the specification of s(x) at the points x = n + }, 
which are disregarded in the definition (6), is immaterial, since s(x) occurs in 
(5) only as an integrable weight-factor, isolated values of which do not influence 
the integral. 

It follows at once from (1), (5) and the continuity (almost everywhere) of 
(6), that r(t) is continuous. 


3. Fourier analysis of r(¢). Since the Fourier expansion of the periodic func- 
tion (6) is 


(7) s(x) = —x" > (—1)"n" sin 2enz = s(u 1) =---, (|2| < 3), 


n=l 


it follows from (5) that” 


(8) rit) = —r" > (—1)"*n“ [ o(x) sin 2arn(x + t) dx. 


Hence, if the sine in (8) is expressed in terms of 2xnx and 27nt, 


(9) ar(t) = — >, (—1)"n "(aq cos 2rnt + b, sin 2rnt), 
n=1 


1F, Zernike, ‘‘Wahrscheinlichkeitsrechnung und mathematische Statistik, ’’Handhuch 
der Physik, Vol. 3 (1928), pp. 475-476. 

2 In view of (1), the term-by-term integration leading from (5) to (8) is justified by the 
fact that the partial sums of the series (7) are uniformly bounded. Correspondingly, the 
above deduction of (9) and (10) from (4) is equivalent to an application of Poisson’s summa- 
tion formula. In this regard, cf. A. Wintner, ‘‘The sum formulae of Euler-Maclaurin and 
the inversions of Fourier and Moébius,”’ Am. Jour. of Math., Vol. 69 (1947), pp. 685-708, 
the end of §1 (p. 687) and its application on p. 697. 





EFFECT OF DECIMAL CORRECTIONS 


where 
(10) b, + ia, = | (x) exp (2rinzx) dz, (n = 1,2,---). 


Let it be assumed that positive and negative errors of observation, when of the 
same magnitude, are equally probable, i.e., that ¢(7) = ¢(—z). Then (10) 
shows that a, becomes 0. Hence, (9) reduces to 


(11) r(t)=— > (—1)"(c,/n) sin 2rnt, 


n=1 


where 


(12) ¢, = 1 I o(x) cos 2enx dx = 2n7' I ‘ 


Clearly, r(¢) is an odd function whenever the density ¢(x) is even. 


4, The normal case. Suppose that ¢(x) is the density of a symmetric normal 
(Gaussian) distribution. Then, if o is the positive constant representing the 
standard deviation of the errors of observation, 


(13) o(2) = (20°) * exp(—42°/0?) (0<a< @). 
It is clear from (5) and (6) that 
(14) r(t) — s(t) if o > 0 in (13). 


Actually, all that (14) says is a triviality, according to which the total error 
becomes the decimal error when the measurements become infinitely sharp. 
In this limiting case, that is, if r(t) = s(¢), it is seen from (6) that the graph of the 
periodic function r = r(t) is piecewise linear, and therefore discontinuous. 

If ¢ = 0 is replaced by 0 < o < o~, the jumps of r(t) at t = n — 3 disappear 
(ef. the end of §3) and, as will be proved below, 

(I) r(t) is an analytic function which is odd, of period 1, and positive for0 < t < 4 
(hence negative for —3 <t <0), and 

(II) the graph of r = r(t) over the fundamental interval 0 S t S 3 is a convex 
arch, no matter what the value of o in (13) may be. ; 

Since r now depends both on the “true’’ value, t, of the observations and the 
“precision”, o, of the measurements, let r be denoted by r(t;«). It will be shown 
that 

(i) Max r(t; o), where the Max refers to t while o is fixed, is a decreasing function 
of o, where o varies on the half-line 0 < o < ~; and that, on the same half-line, 

(ii) r(t; ¢) ts a decreasing function of o at every fired t contained in the funda- 
mental region 0 < t < 3. 

All of this seems to be clear for physical reasons. Actually, it is easy to give 
examples of distribution laws distinct from (13) for which the above assertions 
become false. 





392 PHILIP HARTMAN AND AUREL WINTNER 
5. The #;-function. As is well-known, 

| exp (—42°/o") cos ux dx = (2r0°)’ exp (—30° v’). 
Hence, the value of the integral (12) is q”’, if q is an abbreviation for 
(15) g = exp(—2rz’o’). 
Consequently, if r(¢, q) is defined, in terms of the above r(¢; c), by placing 
(16) r(t, gq) = r(t; c) in virtue of (15), 
then (11) shows that’® 
(17) r(t,q) = —9r 7 (—1)* n° gq” sin 2rnt 

n=l 


It will be noted that the range, 0 < « < «, of the standard deviation is mapped 
by (15) on the range 


(18) S<e< lil, 


and that o decreases or increases according as q increases or decreases. 

Let partial differentiations with respect to ¢ and q be denoted by primes and 
subscripts, respectively : 
(19) f' = af/at, = fa = af/dgq. 
Thus, from (17), 
(20) r'(t,q) = —2 > (—1)” q” cos 2rnt 

n=l 

and, as easily verified from (17), 
(21) ret, q) = (— 4nq)'r'(t, q). 

Let 6(t, g) be defined by 


(22) 6(t,q) =1 +2 >> q” cos nt 
n=l 


(so that 6(t, g) is, in the main, the elliptic theta-function usually denoted by 
3;). It is known that 


(23) A(t,q) > 0 

and that’ 

(24) (t,q) <O01f0<t<-r (hence, 6’(t, gq) > Oif -xr <t <0). 
The above assertions will be deduced from these facts. 


3 Cf. F. Zernike, loc. cit. 
4 For a simple proof, cf. A. Wintner, ‘‘On the shape of the angular case of Cauchy’s dis- 
tribution,’’? Annals of Math. Stat., Vol. 18 (1948), pp. 589-593, §6. 





EFFECT OF DECIMAL CORRECTIONS 393 


6. Proof of (I)—(ID and (i)-(ii). First, it is seen from (17) and (22) that 


(25) r’(i,q) = 1 — 0(2nt — x, q). 


Hence, 
(26) r'(t,q) = — 2rO'(Qxt — x, q). 
If (26) is compared with (24), it is seen that 
(27) r'(t,q) <Oif0 <t <3 (hence, r’’(t, gq) > Oif —4 <t <0). 
Consequently, (1) and (II) follow, since, in view of (17), 
(28) r(+3,q) = 0 = r(0, q). 
Next, (21) and (27) imply that 
(29) r,(t,q) > Ofor0 < t < }. 


Hence, (ii) follows from the fact that g is a decreasing function of o. 
As to (i), let ¢ = ¢(q) denote that (unique) ¢-value on 0 < t < 4 at which 
r(t, q) assumes its maximum value, say r*; so that 


(30) r= r(t(q), 9); (0 < tg) < 4). 
Clearly, t = ¢(q) is the only ¢-value on 0 < ¢ < 3 for which 

(31) r'(t, q) = 0. 

Since r’(/, gq) possesses continuous partial derivatives with respect to ¢ and q, 
and since (27) implies that its partial derivative with respect to ¢, namely, r’’(¢, q), 
does not vanish at ¢ = ((q), it follows that the solution t = t(q) of the equation 


(31) possesses a continuous derivative. Hence, the function (30) possesses a 
continuous derivative with respect to g, namely, 


at, dt(q) 


But since ¢ = ¢(q) is a solution of (31), the identity (32) can be reduced to 
dr* 
a” ra(t(q), 9); (0 < tq) < 3). 


Consequently, (i) follows from (29), since q is a decreasing function of o. 


+ r,(t(q), q)- 














WEIGHING DESIGNS AND BALANCED INCOMPLETE BLOCKS 


By Kk. S. BANERJEE 
Pusa, Bihar, India 


1. Introduction. Following a paper by Hotelling [1] on the weighing prob. 
lem, IXishen [4] and Mood [2] furnished generalized solutions. This note consists 
of some additional remarks on the weighing problem when the weighing is re. 
stricted to be made on one pan. 

Hotelling remarked that when the problem was to determine a particular 
difference or any other linear function of the weights, a different design should 
be sought to minimize the variance. An account of efficient designs of this kind 
has also been furnished in this note. The notations used by Hotelling and 
Mood have been used here. 


2. Chemical balance problem. It has been shown by Mood that when 
N = 0 (mod 4), an optimum design exists if a Hadamard matrix Hy exists, and 
is obtained by using any p columns of Hy. When N = 7 (mod 4), (7 = 1, 2,3), 
very efficient designs are obtained either by adding to or deleting from the rows 
of Hyx , making the resultant number of rows equal to NV. 

It has further been shown by Mood in connection with this class of designs 
that arrangements’ are available which are more efficient than the one obtained 
by repeating the row of ones. As a matter of fact, if any row other than the row 
of ones be repeated, this will lead to a design of the same efficiency as in the case 
of repeated addition of the row of ones; for, the determinant of X’X will remain 
exactly identical. That this is so, will be clear from the following properties 
showing the connection of the matrix X with the determinant | a;; 

(i) Any two rows of the matrix X can be interchanged without changing the 
determinant | a;; |. 

(ii) Any two columns of the matrix X can be interchanged without changing 
the determinant | a;; |. 

(iii) The signs of all the elements in a column of the matrix X may be changed 
without changing the determinant | a;; |. 


3. Spring balance problem. Mood has exhaustively discussed the designs 
when N > p. Efficient designs under this class will, however, be available from 
the arrangements afforded by balanced incomplete block designs discussed in 
[3]. These designs will be represented by certain of the efficient submatrices of 
the P; of Mood. 

Usually v and b are used to denote respectively the number of varieties and the 
number of blocks in the above mentioned designs. Here v will take the place of 


1 This had been independently shown by me before the paper of A. M. Mood was brought 
to my notice by H. Hotelling. 


394 


rob- 
sists 
> Te- 


ular 
ould 
cind 
and 


ing 


red 


om 
in 
of 


of 


rht 


WEIGHING DESIGNS 395 


p, the number of objects to be weighed and b that of N, the number of weighings 
that can be made. The matrix X’X in this case will take the form 


fr AN ses NY 
iAr Ass: At 
(1) IAAT ss* AY 
AAAs? T 


The variance of the estimated weight of each of the p objects for such a design 
can be easily seen to be 


r+ r(p — 2) 2 
(2) FD + ie - Ti” 


where p is the number of objects to be weighed and r and \ have meanings similar 
to those in connection with balanced incomplete block designs; that is, r is the 
number of times each object is weighed, and \ is the number of times each pair 
of objects is weighed together. 

Though the minimum minimorum of o /N can never be attained by the objects 
to be weighed under such designs, o°/N may however be kept as the standard 
with which the efficiency of a given design may be calculated. The efficiency 
of the above design will therefore for zero bias be 


(3) (r — Air + AP — 1)} 
Nir+rA(p— 2)} 


The identities well known in the theory of balanced incomplete blocks, 


bk = vr, Av — 1) = r(k — 1), 


for zero bias, 


may, upon replacing b by N and v by p to accord with the notation of weighing 
designs, be written 


r = Nk/p, A= r(k — 1)/(p — 1). 
Upon substituting these in (3) we obtain the efficiency factor in the form 
k2(p — k) 
(4) sla om Sh . , 
p(pk — 2k + 1) 
where /: is the number of plots per block or the number of objects that can be 


weighed at a time. 


If instead of adopting repetitions of Px , only 4 weighings be made in all, 


the efficiency factor calculated for such a combinatorial design would be 


(r — A)ir + AW — 1)} 


“CTs for zero bias. 
= j 








396 K. S. BANERJEE 


ie | (p—2 
r=(¢2i) »= (2735) 


and b = 4) . The above expression on simplification reduces to (4). 









where 


It will be noticed that the efficiency of such designs depends only upon the 
total number of objects to be weighed and the number of such objects that can 
be weighed at a time. 

These designs have the advantage that all the weights are estimated with 
equal precision. If a slightly larger number of weighing than what is afforded 
by the number of blocks in a balanced incomplete block design has to be made, 
all the objects may be weighed together and this weighing be repeated as many 
times as required. This will be equivalent to the repeated addition of the row 
of ones. The repetition of the row of ones in particular is necessary to make the 
weights estimable with equal precision, which however, may be demanded at 
times as a matter of necessity in certain experiments. Otherwise, any other 
single row or different rows of the matrix X may be repeated, making the number 
of rows of the matrix XY equal to the number of weighings proposed to be made 
in all. 

From the practical point of view also, it will be advantageous to connect the 
designs for weighing with the already existing balanced incomplete block de- 
signs, which have been highly developed in recent years and are being extensively 
used in agro-biological investigations. 

4. Spring balance design for small p. Under this class of designs, Mood has 


found the most efficient design for p = 7. It is given by 





°1010101, 
/0110011; 
0001111; 
L; = |1100110}. 

/0111100— 

(1011010) 
(1101001 














This L; is easily recognized to be the design for k = 4,b = 7,v = 7,r = 4, 
\ = 2, given by an orthogonal series [3]. It is therefore seen that Hadamard 
matrices will lead to a new method of constructing balanced incomplete block 
designs of a certain class. For example Hig and H2 will lead respectively to the 
designs for k = 8,b = 15,v = 15,r = 8,A = 4 (orfork = 7,b = 15,v = 15, 
r = 7, = 3) and fork = 10,b = 19,v = 19,r = 10, = 5 (ork = 9, b = 19, 
v = 19,r = 9, = 4). These designs also satisfy the condition of maximum 



























WEIGHING DESIGNS 


efficiency, by virtue of the fact that | Ly | will have the value 
(N + gr" a. 
as shown by Mood. 


6. Determination of a linear function of the objects. An orthogonalized 
design which is cent percent efficient to determine individually the weight of p 
unknown objects is not necessarily the design of maximum efficiency for the es- 
timation of a linear function of the objects. To illustrate this, let there be three 
objects, the weights 0, , Oz, O3, of which have to be estimated on a balance 
corrected for zero bias and let us, for this purpose, concentrate on the design 
characterized by the matrix given below. 


] 

] 
~ = = 
6) i~i 3 
| «| =}; 
As has been indicated in the previous papers, the variance of each of the unknown 
objects comes out to be }o°, which is the minimum minimorum and as such the 
above design enjoys the cent percent efficiency, when the question of individual 
estimation is concerned. But in estimating a linear function of the objects, 
for instance the total weight, designs more efficient than this are available. 

The variance of 1,0, + [02 + 1,03 is known to be 


3 


(6) l; OFF o 


1 


oy) 


where C;; denotes the elements of the matrix reciprocal to the matrix X’X. 
As the above design furnishes the estimates of the unknown objects orthogonally, 
the variance of the estimated total weight of the three objects will be given by 


9 ° ° e 
35°. If, however, the design given by the matrix 


et 
(7) a= | | 
| 


be adopted, the variance of the estimate of the total weight may be easily seen 
to be (3/7)o’, by putting l, = l = 1; = 1. (3/7)o° is evidently less than 30°. 
Therefore with four weighings, the design characterized by (7) is more efficient 
in estimating the total weight than that characterized by (5). A still more effi- 


cient design for getting the total weight is simply to weigh all the objects to- 
gether four times. 


6. Designs with arrangements afforded by balanced incomplete blocks. The 
necessity for an efficient design to estimate any linear function of the objects 











398 K. 8. BANERJEE 


(or to be precise, say to estimate the total weight) will perhaps arise only when 
the objects cannot all be weighed at a time collectively on a single pan. Here 
also, an efficient design under the supposition that all the objects cannot be 
weighed together is afforded by the arrangements in balanced incomplete blocks, 
In such a design, the diagonal elements in the matrix reciprocal to X’X will be 
all positive and equal to 


__ r+Xp — 2) 
(r — d){r + Ap — 1}? 


while the remaining elements in the reciprocal matrix will be negative and equal 
to 





(8) 


—X 
(r — A){r + A(p — 1} 
Using the generalized form of (6) and admitting of the possibility that any of the 


arbitrary constants 1; may be negative, the variance of the linear function 
P , 1,0; may be easily seen to be 


xii A(1;)° 2 
10 — = > 
i (a - Se 
If, however, in the above expression, the coefficients 1; are equal to 1, (10) is the 
variance of the estimated total weight, and reduces to 


(9) 


Pp 2 

When there are N weighings in all, the minimum variance that can be reached 
is ¢ /N and will be attained, it appears, only when all the objects are weighed 
together and the weighing is repeated N times. The efficiency of a given design 
may therefore be calculated with reference to o'/N. Remembering that the 
number of weighings takes the place of the number of blocks and p the place of », 
the efficiency of the design will reduce to (k/p)’, where k is the number of plots per 
block i.e. the number of objects that can be weighed at a time. 

If, however, the combinatorial arrangement is adopted weighing all possible 
(?) weighings in all, the same efficiency 


v 


combinations of k objects and making 


as above will be obtained for such a design. 

Given k, the above expression of efficiency will therefore be the deciding factor 
for choice between an arrangement of balanced incomplete block design and all 
possible combinations of k objects. 


7. Design of maximum efficiency. Designs leading to the matrix X’X of 
the type (1) have certain advantages inasmuch as the variances of the individual 
objects are equal, as are also the covariances between all possible pairs. The 





en 
re 
be 
cs, 
be 


he 
on 


he 


ed 


he 
v; 
er 


dle 


or 
all 


of 
al 
he 


WEIGHING DESIGNS 399 


variance of the estimated total weight in such a design is given by (11). To 
minimize the variance thus obtained, the expression 


(12) r+(p—1)a 


has to be the maximum for a given value of p. In an arrangement of the bal- 
anced incomplete block type or in an arrangement with all possible combinations 
of k objects being weighed at a time, (12) would reduce to rk and would therefore 
increase with the increasing value of rk. This shows that the estimation of the 
total weight will have increased precision if more of the objects are weighed at a 
time. 

If all the objects could be weighed at a time and both the pans be used for the 
purpose, some of the elements in the matrix X will be --1 instead of 0. This 
would increase the value of r but would decrease the value of \. To devise the 
best possible design therefore, account will have to be taken simultaneously of 
r and X. 


REFERENCES 


[1] Harotp Hore une, “Some improvements in weighing and other experimental tech- 
niques’’, Annals of Math. Stat., Vol. 15 (1944), pp. 297-306. 

[2] A. M. Moon, ‘‘On Hotelling’s weighing problem’’, Annals of Math. Stat., Vol. 17 (1946), 
pp. 432-446. 

[3] R. A. FisHer anv F. Yates, Statistical Tables for Biological, Agricultural and Medical 
Research, Oliver and Boyd, London, 1938, pp. 10-13. 

[4] K. Kiswen, ‘‘On the design of experiments for weighing and making other types of 
measurements’’, Annals of Math. Stat., Vol. 16 (1945), pp. 294-300. 

[5] C. R. Rao, ‘On the most efficient designs in weighing’’, Sankhyd, Vol.7 (1946), pp. 440 











BOUNDS FOR SOME FUNCTIONS USED IN SEQUENTIALLY TESTING 
THE MEAN OF A POISSON DISTRIBUTION! 


By Leon H. HeEerRBAcH 


Brooklyn College 


1. Introduction. Let z = log i a where f(z, \s;) = (e* Nj) /z!, 
9 0 


(¢ = 0, 1), is the elementary probability law of a Poisson variate X, under the 
hypothesis that the mean is equal toA;. Without loss of generality we shall 
assume Ai > Xo. 

Let Ho be the hypothesis that the distribution of X is given by f(z, Xo). Wald 
[1, pp. 286-287] has devised general upper and lower bounds for the probability 
of accepting Hy , when X is the true value of the parameter, and the sequential 
probability ratio test is used. This probability is called the operating-charac- 
teristic function and is designated by L(A). Using these results he has com- 
puted the bounds for the binomial and normal distributions [2, pp. 137-142]. 
We shall do the same thing for the Poisson distribution, since the restrictions 
[1, p. 284, conditions I to III] under which these general limits are valid can 
rather easily be shown to apply to the Poisson distribution, if we make the fur- 
ther restriction that H(z) # 0. 

These general results are 


=i 1 — »B" - 
signees < _— “ ae ) 
5A* —- B= 1 L(\) < A* — »B’ ifh > 0, 
and 
(1) St epg Se th <0 
6B» — A* — = Be — yA*’ ) 


where a, 8 are probabilities of committing errors of the first and second kind re- 
spectively and 


A=(1-8)/a, B= 6/(1 — a) 


n = gib ree l\ev< ') ‘>t 
f ? 
(2) 
6 = lub pB( ‘<> ‘) 0 <p <i; 
p P/> 


and h is the non-zero root of the expression, Ee** = 1. Hence the only remaining 
unknowns are 7 and 6. 


1 The author is indebted to Professor A. Wald for suggesting the problem which led to 
this note and for helpful discussions. 


400 





t! 


he 
all 


e- 


BOUNDS FOR SOME FUNCTIONS 401 


The following bounds to En, the expected number of observations required 
by the sequential probability ratio test defined by a, 6 have been derived [1, pp. 
143-147]: 


L(\)(log B + &’) + [1 — L(A)] log A 
Ez 





< 

> En 
< L(A) log B + [1 — LQ)](og A + £) 
> Ez ’ 
the upper or lower inequality signs holding according as Ez > 0 or Ez < 0, where 


(3) ¢ 


Min Ez + r|z+7 <0), 


(4) g Max E(z — r|z—r> 0), (r> 0). 


Using the limits to L(A), we then find ¢ and é’, which determine En. 


2. Special terminology. By an almost-increasing function we shall mean one 
that has the following properties: If x is any point of discontinuity, then (a) x + k 
is also where k is any integer and x + 1 is a point of continuity if / is not integral, 
(b) f(a — €) < f(a — e’) < f(x) for 0 < & < € <1, (c) f(x — 1) < f(x), @) 
lim,o f(x + €) = f(x +) < f(x), (e) f(x —1+) <f(x+). It is clear that the 
minimum value for f(y) in any closed interval [a, b] is equal to min [f(a), f(a’ +)] 
where a’ is defined as a if the closed interval contains no discontinuity, and as 
the leftmost point of discontinuity otherwise. As special cases, if a is a point of 
discontinuity this minimum is f(a +) and ifx <a <b < 2+ 1 the minimum 
is f(a). 

Almost-decreasing functions are defined similarly except that the inequalities 
go the other way. In this case the maximum in the interval is max|f(a), f(a’ +)] 
and we have special cases as above. 


3. The caseh > 0. Since e’ = a’e °, where a = );/Xy and c = (Ay — Ao) the 
condition e’* < 1/¢ may be expressed as a’“e~” < 1/t¢, whence 
(5) x < c/log a — log ¢/(h log a) = s — r (say). 
Sincex > 0,r < s. HenceO <r<s. Also 
or 


(6) Ee* = >> (€“ a‘) - = exp (—ch — \ + da’), 
z=0 : 





and 


(7) cE(e™| e* < 1/t) = tE[(e“a’y" |x < 8 — rv). 








402 LEON H. HERBACH 


From (5), ¢ = a” and (7) becomes 






[s—r] s* iy 





—ch zh 
é a 


! 
(7.1) a” z=0 az: -* 
{s—r] é* - ? 












z=0 | x! 
where [s — 7] is the largest integer < (s — r). Our problem is to minimize (7) 
with respect to¢. Since r is a strictly increasing function of ¢, this is equivalent 
to minimizing a”“C/D = @ (say) with respect to r, where 


{s—r] >* Psa [s—r] rn 


C= , and D= 





z=0 x! cad 2! ° 





It will be shown that (7.1) is an almost-increasing function of r and therefore 
the minimum occurs at either r = 0 or r = v +, where v = s — [s], since the 
saltuses occur at r = »v + k fork = 0,1, 2, --- , [s]. 

Since a™ is an increasing function of r and C/D remains constant as long as 
[s — r] remains constant, condition (b) is fulfilled. 

Conditions (c) to (e) refer to the saltuses only, hence, to show them, we may 
assume, without loss of generality that r and s are integral. We proceed by in- 
duction, using the notation @(w) to mean the value of 6, when r = w, to show (c). 

First we prove the following: 

Lemma A. @(s) > @(s — 1). 

Proor: Since we assumed \; > Ay and hk > 0, a" > 1. Hence (1 + Aja‘ > 
1 + da", whence, a fortiori, a” > aS"(1 + ra")/(1 + 2). 

To show that if 6(r7 + 1) > @(r), then @(r) > 6(r — 1), we shall show that 


(8) CD 4 Dba‘**™ < CDa" + Cb 


implies 





















(9) CD +4 tie” < CDa" ss Chqa’, 
where n = s — 7,6 = X”"/n!,q = A/(n + 1). 


Since, as we shall see below, 









(10) Dba"*?"(q — 1) < Cb(qa* — 1), 
or 
(11) Da'**”"(q — 1) < C(ga’ — 1), 


addition of (8) and (10) yields the desired result, (9). 
It now remains to prove (11) or that 








n xz ah 


(12) [= ] a*(. nn —1) < ‘= - 


z=0 MP 


Jo" —n-—1). 









BOUNDS FOR SOME FUNCTIONS 403 


Setting (6) equal to 1 we get da” = ch + 2, which when substituted in (12) 
yields 


(ch + ™" A — mn — DY) >> <x 4(ch +4 —n- 22 


Upon letting p = ch + i, we have 
A— W@+NH)SX - p-&@+Dy>P _ 


ati z—0 ! pe z—0 x! Fp), ia 


Then our problem reduces to showing that F(y) is increasing inO0 <A<y<p 
or that the derivative with respect to y, F’ (y) is positive. 


oe Z (n oe _ etl Da (n + 1) > *(n or wn + (n + 1*y"? 
>(n + vy po. since (n + 1) > (x2 + 1); 
> 0 since y > 0. 


Thus condition (c) is demonstrated. To show (d) we must show that 
6(r +) < 0(r), which means that 
nC — ba” nC 
"Cen 
But this is true if C < Da™ which is easily verfied. Condition (e) is equivalent 
to showing that 
(r—ya C nC — ba™” 
ve Be 
which is proved just as (c) was. 


Hence, 
—\ h —r 
in {oa Owe / ere 
7 = min . 


zo x! 


—1] «> x he {s—1] -A,\2z 
a’ oe ~ a ea | e vr 


As special cases we have (i) if s is integral, 7 is the latter aa v = 0 and (ii) 
if s < 1 (b) is the only applicable condition and we have an ordinary increasing 
function, hence 7 is the former. 

Similarly, it may be shown that 


(14) 6 = max [e“E(a"|x > {s}), ae “E(a™ |x > {s + 1})], 


where {s} is the smallest integer > s and » = {s} — s. Here there is only one 
special case, namely (i). If h < 0, 6 is the larger of the two expressions on the 


right side of (13) and 7 is the smaller of the two corresponding expressions in 
(14). 


(13) 

























we 
’ 





404 LEON H. HERBACH 


4, Since z = — c+ 2 loga, & may be written 


Max log aH(x — t| 2 > 2), 
t 


where ¢ = (r + c)/(loga). Hence s = c/loga <t < «. Therefore if we can 
show that E(x —t|x > t) = y(t) (say), is an almost-decreasing function of | 
we will know that £ occurs either when ¢ = s or {s} + since, as will be seen, the 
jumps occur at integral ¢. 

To show (c) we make use of the following which is easily proven: 

Lemma B. Let " Y, 7 each be greater than zero. Then a necessary and suff- 


cient condition that ~ =e a y se 5 is that XZ < Y’. 
Therefore, to ioe for integral ¢ that 
(15) y(t) < y(t — 1), 


or that 


\ 
\ 


2 a oo + 2 a 
De (a -)% £2 ~O— +. — 
rat! zt oe... zat ©! 

0 > 

a 

we need only show that, for all integral ¢, 

ere oo (a fis t)d” oo | 
(16) Gai a <2 al: 


t=t tb 
Since both sides of (16) are power series in \ where the exponents start with 2t 
we need only show that the coefficient of every term on the left is less than the 
corresponding term on the right. _ 
In the case of the coefficient of \”?*', (j > 0) we have to show that 
23+ 1 2 2 


G+H+DG-D! ~ e+ ut GF H- Ye D! 


2. 3° 7 1 
aa * ee 1)! 


~ 9) 





1 
eee GED D! 


or by multiplying both sides by (2¢ + 27)! that 


2t + 2; 2t + 2j 2t + 2j 2t + 2j 
(27 + 1) <2{ +2 ee : 
t—1 t t+ 1 t+j-1 


2t + 2 
+ _ | = M, say. 
i % 


Replacing all the binomial coefficients on the right by the smallest one we have 


(2) + 1) r = 4 < (2 +1) P + ”) < M, 











Can 
of t 
the 


1 2t 
the 


ay. 


uve 


BOUNDS FOR SOME FUNCTIONS 405 


since (, = J < (”) forn > 2s. Thus the truth of (16) has been established 


for even exponents. The odd terms are treated similarly. 
Hence, we have shown that y(¢) is a strictly decreasing function of (, if t takes 
on integral values only. We shall now show (hb), i.e. that 


0 


Tis ~ 9 >» (x-t+e* 
=t 


(17) y(t) = , = ¢ SEs = y(t — e). 


z 


o dX oe ” 
be si = 


The denominators are equal and each term of the numerator on the right is 
greater than the corresponding term on the left, hence (17) is valid. 
Conditions (a) and (d) can be shown, by showing in a similar manner, that 


(18) yi +) =1+7¢+ 1) 


and y(t) > 1 + y(t + 1) for integral ¢. By using (18) for ¢ and t — 1 together 
with (15) we show y(t — 1 +) < y(t +), which is condition (e). Thus we have 
shown that 


oo FF oo z—,k 
et mee HAE / BN 


| z={ 8} ={8} x! 
€ = max 4 


oo x" . 2 7 e 
toga —is) + EO / } 


z={st1} 2! z={et1} @! 





As in Section 3, ¢’ is the lower analogue of &, i.e. 


¢’ = min {— c + E(x |2 < [s]), — [s] log a + E(x| x < [s — 1))}, 


and the special cases are as in that section. 


REFERENCES 


[1] A. Waxp, “‘On cumulative sums of random variables,’’ Annals of Math. Stat., Vol. 15 
(1944), pp. 283-296. 

[2] A. Waxp, ‘“‘Sequential tests of statistical hypotheses,’’ Annals of Math. Stat., Vol. 16 
(1945), pp. 117-186. 











NOTES 
This section is devoted to brief research and expository articles and other short items, 


en RR a 


THE DISTRIBUTION OF STUDENT’S t WHEN THE POPULATION MEANS 
ARE UNEQUAL 


By HERBERT ROBBINS 
Department of Mathematical Statistics, Universtiy of North Carolina 


Let 21, +--+, %w be independent normal variates with the same variance o° 
and with means yw, -:- , uy respectively. Set n = N — 1 and let 
N N 
(1) z=), xi/N, & = >> (x; — #)*/n, t = N'Z/s. 
1 1 


If all the u; are 0 then ¢ has Student’s distribution with n degrees of freedom; its 
frequency function will be denoted here by 


a 
@ — fro =m B(Z,2)[ «+ e/my tem. 
When dealing with situations involving mixtures of populations or in which the 


mean exhibits a secular trend, it is important to know the distribution of ¢ 
when the y; are arbitrary; in the general case let 


= > ui/N, es Z (ui ro u)’/N, 
(3) 1 1 


a = Nj’ /20’, ) = Np’/20°. 


The distribution of ¢ will be shown to depend on the three parameters n, a, X. 
If \ = # = 0, so that all the yu; are equal, then the distribution of ¢ determines 
the power function of the ordinary ¢ test. We shall here consider the case in 
which a = & = 0, although the yu; are different. Denoting the frequency function 
of ¢ in this case by f,,,(¢) we shall show that 


(4) fnx(t) = fao(t) - exp ee} - F(—3, n/2, > 1 + &/n)”), 


where F denotes the confluent hypergeometric series, and where, since 7 = 0, 
. 

(5) A = 7 us/2e°. 
1 


In fact, the general distribution of t, of which (4) represents the case a 
406 


I 
oO 





es 
in 
on 


STUDENT'S t 407 


may be derived as follows. Using the standard orthogonal transformation 
(1, p. 387] let 


N N 
(6) a= 2 isi, u= Do ci: (i ~ 1, --+,N), 
i= i= 
where 
(7) a; = N* (j =1,-++,N); 
then 


(8) = va /(X :). 


The joint frequency function of the z; is easily seen to be 


N 
0 orto om Eta — aitiaeh 
where 

N 
(10) a =N'p, Diaj = Ne. 


Thus ¢ is the ratio of a non-central normal variate to the square root of an in- 
dependent non-central chi-square variate. It is known [2, p. 138] that the 
N 


° 2 2 2: 
frequency function of g = >) zi/o’ is 
9 





oo 1y ,2\9 
11 Ag. (1g?) oa? (3Aq°) 
(1) os 2 jtran FD” 
where 
N 
(12) d= > aj/20” = NB’/20°. 


The frequency function of v = 2;/¢ is 
1 (ov — a La oy) . Sr (Za)k? 
g(v) = mol , | - 2a @ 
that of q is, by (11), 





ae C) 7 2j+n-1 
h — gi-(nl2) a (q2/2) nil aati 
Vd Oe Ra HOPED ae 
hence that of u = v/¢ = nt is 


[ n@o(uada a 


which, after integration, reduces to 
0c 8060 j t k . 


1 2 li tas lal 
j=0 ko) tk! T'(n/2 + j) sill 











408 HERBERT ROBBINS 


In particular, if a = ~ = O then (13) reduces by means of the relation 
F(a, y, 2) = &F(y a hy x) to 


-1 
_ [3 (3.3) eNO 1 a). P (-3.5.-aat wy), 


from which it follows that the frequency function of ¢ is given by (4). 

Again, let 21, --- , 2v,+n, be independent normal variates with the same vari- 
ance o and with means m1, -:-, My, +n, respectively. Set m = Ni — 1, 
m = No —1,n = m+ mM, and let 


Ni Ni+tNoe 
i= Zz xi/Ni, ty = z xi/Ne 
1 N,+1 
. 2 ~ = \2/ g = \2, 
(15) $| = 7 (x; — %)°/m, 83 = Zz (x; — X2)°/ne 
1 N,+1 


s’ = (ms; + mesz)/(mi + me), & = [NiN2/(Ni + N2)}*(41 — %2)/s. 


If all the »; are equal then ¢ again has Student’s distribution with n degrees of 
freedom. In the general case let 


Ni Ni+No 
hi = X ui/ Ni, i, = 2 ui/ No, 
. i+ 
(16) Ni NitN» 
A = » (u: — fi) /M1, Bz = a (u; — pie) /Ne. 
Na+ 


Then we may show as before [1, p. 388] that in this case u = n *t has the fre- 


quency function (13), where now 
(17) N=Ni+N2—1, X= (NiBi + N2B2)/20°, 
a= [Ni No/(Mi1 + N2)| (i —_ ji2)'/o°. 


In particular, when a = fi — fz = 0, so that i: = fe = fi, say, the frequency 
function f,,,(¢) of ¢ is again given by (4), where now 
Ni+No 


(18) = DY (ui — w)*/20’. 


1 


Extensions in this direction to the general linear hypothesis in the analysis of 
variance will not be treated here 
If we set 


(19) w= (14+¢/n)” 


where ¢ has the frequency function (4), then w will have the frequency function 


-1 
(20) gar(w) = |B G. | ee gt. 8 TP (-5, ? = ww), 





& 


of 


n 


STUDENT’S t 409 


for0 <w<1. Thus for every ¢, 


t (1+#2/n)~1 
(21) it~ [ Fnr(z) dx = | Jn (w) du. 


It would be interesting to have numerical values of the integral on the left side 
of (21) for that value of ¢ for which 


(22) 1— [ | Fno(x) dx = 0.01 or 0.05 (say), 


but existing tables (e.g. those in [2] and [3]) of the integral of (20) were compiled 
for a different purpose and do not supply this information. The following re- 
marks throw some light on this subject. 


Let us set 
R(t) = faa(t)/fn.o(t) = ex f =de/n -F(-} a oe RE A e/ny*) 
Jn AMO] / Jno Pi 4 t2/n, 2°2’ 


{1 — rx(P/n)/(1 + #/n) + o(d)} 
- {1+ A/(n + &) + o(d)} 
1+A(n+ #)7"1 — #) + o(d). 


Then as \ — 0 we have ultimately 


I 


(23) 


R(t) > 1if |t| <1, 
(24) 
R(t) < 1if|t| > 1. 


Hence for any t > 1 and for sufficiently small X, 
t t 
(25) 1— [| fara) dx <1 [ fuola) de. 
Lan @ — f 
The exact range of values of ¢ for which R(t) < 1 depends of course on n and 
\. However we shall show that always 
(26) R(t) < 1if|t| > 1, 


so that (25) holds for all n and X > 0, provided t > 1. The proof is as follows. 
In terms of w we have 


(27) R(t) = &*O™.F(—34, n/2, — Aw) = e*F((n + 1)/2, n/2, Aw). 
Now 
F((n + 1)/2, n/2, ¥w) = 1 


) 4 HOt Vln t 3) t 2 = 


1) a. yk 
1S n(n + 2).--(n + 2k — 2) (Aw)"/k!, 











410 LOUIS GUTTMAN 


and by induction on k we may show that for all k = 1, 2,---, 


(n + 1)(n + 3)---(n + 2k — 1) 
(29) = sl Mn A od need 
n(n + 2)---(n + 2k — 2) 
where the equality holds only for k = 1. Hence 





< 1+ k/n, 


(30) F((n + 1)/2, n/2,r\w) << 1+ ; (1 + k/n)-(dw)*/k! = bY? (1 + Aw/n), 


(31) RY) < 67". + rw/n) < OO. al oe gett, 
Hence R(t) < 1 if w < n/(n + 1), which is equivalent to (26). 


REFERENCES 


[1] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1946. 


[2] P. C. Tana, ‘‘The power function of the analysis of variance tests with tables and illus- 
trations of their use,’’ Stat. Res. Memoirs, Vol. 2 (1938), pp. 127-149. 

[3] Emata Leumer, ‘‘Inverse tables of probabilities of errors of the second kind,’’Annals 
of Math. Stat., Vol. 15 (1944), pp. 388-398. 


(a 


A DISTRIBUTION-FREE CONFIDENCE INTERVAL FOR THE MEAN 


By Louis GuTTMAN 


Cornell University 


1. Summary. Consider a random sample of N observations 71, 2, +: ,2y, 
from a universe of mean uw and variance o. Let m ands be the sample mean 
and variance respectively: 


l< » 1 : 

(1) m= N a Xs - = Ve (a; — m). 
It is shown that the following conservative confidence interval holds for up: 
(2) Prob {(m — un)’ S 8#/(N — 1) + do vV/2/N(N — 1)} > 1 — Yd”, 
where } is any positive constant. Inequality (2) also holds if, in the braces, \ 
is replaced by 1/2 — 1, with \ = 1. 

Inequality (2) is much more efficient on the average than Tchebychef’s in- 
equality for the mean, namely, 
(3) Prob {(m — np)’ < No’ /N} > 1 — 2d”, 


yet (2) and (8) are both distribution-free, requiring only knowledge about 7, 
At the 1 — X” = .99 level of confidence, the expected value of the right member 
in the braces of (2) is only about 1/6 the corresponding member of (3); at the 
.999 level of confidence the ratio is about 1/20. 








Vy 
in 


er 
ne 


CONFIDENCE INTERVAL FOR MEAN 411 


A more general inequality than (2) is developed, also involving only the single 
parameter * 


2. Derivation. Consider the function 


(4) u = (m — np) — 8/(N — 1) — oo’, 
where ¢ is an arbitrary constant. It is easily verified that Eu = — co’, and that 
(5) Ew = o'[2/N(N — 1) +e}. 


A basic feature of (5) is that the only population parameter in the right member 
is o.. Contrary to what might have been surmised, the fourth moment of z 
about » is not involved, and indeed need not exist. 

According to Tchebychef’s inequality, 
(6) Prob {— \\/Ei2 S u S AV EV} > 1 — dN, 
where A is an arbitrary positive number. Using (4) and (5), it is possible to write 
(6) as: 


Prob {s'/(N — 1) + ca” — do/2/N(N — 1) +c S (m — ny 


< s8/(N — 1) + ee + V/2/NW —- 1) +e} > 1 -— 2X”. 


In the braces of (7), if the left member is negative, there is no harm in replacing 
it by zero; if it is positive, then replacing it by zero may only increase the prob- 
ability of the braces. Regardless of the value of this left member, it is true that 


Prob {(m — zp)! Ss 8/(N — 1) 


+ ole + /2/N(N — 1) + &]} > 1 — 


If we set c = 0, we have inequality (2). Some improvement over (2) is obtained 
by determining c to minimize the right member in the braces of (8), yielding as 
the shortest confidence interval: 


(9) Prob {(m — n)’ S$ 8/(N — 1) +0° V202 — 1D/NWW — D} > 1—-d”. 
Inequality (9) differs from (2) only by replacing \ in the braces by +/ \? — 1. 


3. Comparison with Tchebychef’s inequality. The expected value of the 
right member of the braces in (2) is 


(10) o [1/N + \vV/2/N(N — 1). 

The ratio oi (10) to the corresponding value of Tchebychef’s inequality (3), 
namely \’o°/N, is 

(11) (1 + AV/2N/(N — 1)|/X. 

Since (11) decreases as X increases, the efficiency of inequality (2) increases com- 
pared with that of Tchebychef as the level of confidence 1 — \~’ increases. The 








412 LOUIS GUTTMAN 


squared interval of (2) involves only the first power of A, while that of (3) in- 
volves the second power. 


4. Approach to normality. If the fourth moment of the universe’s distriby- 
tion exists, then it is well known that the ratio of E(m — p)* to o*/N* must ap- 
proach 3—the ratio for the normal distribution—as N increases. That is, if 
a + 1 is the ratio, then limy...a° = 2. It is known’ that Tchebychef’s inequal- 
ity can be replaced by one involving both a” and o”, and that 


(12) Prob {(m — pn)’ S o (1 + Aa)/N} > 1 — dX~. 


If a” = 2, then the right member in the braces of (12) becomeso’(1 + d+/2) IN. 
This is virtually the same as (10), the expected value from (2). In a sense, then, 
(2) implicitly takes account of the fact that the distribution of sample means 
approaches that of the normal distribution with respect to the fourth moment. 
A striking feature, however, is that (2) holds for any N > 1 and does not even 
presume the fourth moment of the universe to exist, whereas to set a = ¥/2 in 
(12) in general requires a large N and finite universe fourth moment. 


5. Further possibilities. Confidence interval (2) is derived from but one 
of a series of general intervals, each of which depends only ono”. It may be pos- 
sible to derive from this series even more efficient intervals, according to the 
method now to be outlined. 

One way of arriving at (2) is to consider all products of the form (x; — 4) 
(x; — wu), where i > j andi, 7 = 1, 2,---,N. Let po be the mean of these 
N(N — 1)/2 products. It can easily be seen that po = u in (4) with c = 0, 
so that pz is a second degree polynomial in m — uy, the coefficients being sample 
statistics. A more general quadratic would be w. = po + cipi + Co, Where q 
and ¢) are arbitrary constants and p; is the mean of the N values (x; — pu) or 
pi =m—uz. It is easily seen that Ep, = Ep, = Epip, = 0, and that the only 
universe parameter involved in Ep; and Ep: iso’. Hence the only universe pa- 
rameter upon which wu; depends is also o’. 

Higher degree polynomials in m — yu can be defined, possessing the same 
properties as w%. Let p3; be the mean of the N(N — 1)(N — 2)/3! products of 
the form (x; — w)(x; — w)(ae — w), wheret > 7 > k andi,j,k = 1,2,---,N; 
etc.; and let py = (11 — pw) (a2 — w) --+ (aw — w). Set po = 1, and let 


(13) Un = >> Can Pa (n = 1,2,---,N), 
a=—0 


where the ca, are arbitrary constants. It is easily seen that Ep, = 0 (a > 0), 
Epa = 0 (a # b), and that each Ep; depends on only the parameter o” as far 


1 See, for example, Louis Guttman, ‘“‘An inequality for kurtosis,’’? Annals of Math. 
Stat., Vol. 19 (1948), pp. 277-278. 





CONFIDENCE INTERVAL FOR MEAN 413 


as the universe is concerned. Hence Eu’, depends only ono’. Furthermore, by 
writing 1; — was (x; — m) + (m — yp), it is seen that p, is a polynomial of degree 
ain m — up, the coefficients being sample statistics. From (13), then, u, is a 
polynomial of degree n in m — yu with statistics as coefficients. 

According to Tchebychef’s inequality, 


(14) Prob {u;, < VEu.} > 1— xr”. 


The interval for u, in the braces can be expressed in two statements: 


IA 


(15) fa(m — p) = tn — AWE. < 0, 


(16) gn(m — 2) = Un + AV Ev, 


IV 


0. 


Both f, and gn are polynomials of degree n in m — uy, gn exceeding f, always by 


the additive constant nV Eu’. Let g, and Q, be the smallest and largest real 
zeros respectively of f, , and let r, and R, be the smallest and largest real zeros 
respectively of gn . 

For convenience, we can suppose that c,,—the coefficient of (m — p)” in u,— 
is positive. Ifn is even, then f, is positive form — » > Q, and for m — p < qn. 
Hence the interval q. S m — uw S Q, contains ali the points included in (15) 
and possibly more. Since the probability of (15) is not less than the probability 
of (14), we can write the following confidence interval: 





(17) Prob {qn Sm—» SQ} >1—-2d* (n even). 


The problem remains to determine the can, so as to minimize the expected value 
of Q. — qn- Inequality (9) provides the minimum for the case n = 2. This 
can be verified by adding the term cp; to u in (4) and finding that the minimum 
requires ¢; = 0. 

If n is odd, we again may set ¢c,, > 0. Then f, > 0 for m — wp > Q,, and 
gn <Oform—p<r,. The intervalr, <S m — p S Q, thus contains at least 
all the points found jointly in (15) and (16) and hence forms a conservative con- 
fidence interval: 


(18) Prob {rn S m—ypSQ,} >1— 2X” (n odd). 


Again, the problem is to determine the ca, so as to minimize the expected value 
of Q. — r,. Tchebychef’s inequality (3) does this for the case n = 1. 

Although the only population parameter involved throughout is o°, the sample 
moments up to the nth order are present in (15) and (16). It thus seems plau- 
sible that improvement over inequality (9) should be possible for n > 2. To 
obtain such an improvement requires developing a distribution-free theory of 
the zeros of f, and g, beyond the quadratic case. 











414 E. CANSADO MACEDA 


ON THE COMPOUND AND GENERALIZED POISSON DISTRIBUTIONS 
By E. Cansapo MaAceEeDA 
University of Madrid 


1. Summary. In this note we deduce several properties of the compound 
and generalized Poisson distributions; in particular their closure and divisibility 
properties. An infinite class of functions whose members are both compound 
and generalized Poisson distributions is exhibited, and several of the distributions 
of Neyman, Polya, etc. are identified. The present note stems from a paper by 
Feller [2]. 


2. The compound Poisson distribution. If F(x | a) is a family of distribu- 
tion functions depending on the parameter a, and U(a) is a distribution function 
such that it assigns zero probability to any a domain for which F(2\a) is unde- 
fined, then 


G(x) = F(x | a) dU(a) 
is a distribution function. In particular if F(x\a) is the Poisson distribution 
with mean a, and U(0) = 0, G(x) is called the compound Poisson distribution 


associated with the distribution function U(a); ef. Feller [2]. Clearly G(x) isa 
step function over the non-negative integers, the saltus at the point x = n being 


Tv, = i e ™ aa dU(a), r= 0, ‘, 2, ne 
0 Hp 


It is convenient to introduce the factorial moment generating function 
(f.m.g.f.) for G(x) as follows 


w(z) 


E((1 + z)*) = aX w,(1 + 2)” 


“ | et dU(a) 
0 


= (2) 


where ¢(z) is the ordinary moment generating function (m.g.f.) for U(a). This 
gives a convenient relationship between the moments of U(a) and its associated 
compound Poisson distribution. 

On account of the multiplicative properties of w(z) and ¢(z) under the convolu- 
tion of G(x) and U(a) respectively, it is seen that the compound Poisson dis- 
tributions form a closed family, and if G(x) and G2(x) are two compound Poisson 
distributions associated with U;(a) and U2(a) respectively then Gi(x)sG2(x) is 
associated with Ui(a),U2(a). In addition, if U(a) is infinitely divisible (cf. 
Cramér [1]) then G(x) is also, since it can be factored into the convolution of 
arbitrarily many compound Poisson distributions. 





nd 
ity 
nd 
ns 
by 


Ve 
on 
le- 


on 
on 
3a 
ng 


1is 
ed 


POISSON DISTRIBUTIONS 415 


Choosing in particular U(a) as the Pearson type III distribution, the asso- 
ciated function is the Polya-Eggenberger distribution, and if U(a) is a Poisson 
distribution the associated function is the Neyman contagious distribution of 
Type A. 


3. The generalized Poisson distribution. If F(x | a), defined for non-nega- 
tive integers a = 0, 1, 2, --- , is the a-fold convolution of a given distribution 
F(x) with itself, i.e. F(x|a) = F(x)**, and U(a) is the Poisson distribution with 
parameter a, then the distribution function 


G(x) = [ F(x | a) dU(a) 


is called the generalized Poisson distribution associated with F(x). 
If 2(z) is the f.m.g.f. of U(a) then for the f.m.g.f. of G(x) we have 


w(z) = a (Q(z))"e * A 
n=0 n! 


a(Q(z)—1) 
a . 


It follows that w(z) can be written as [] w,(z) where w,(z) is a generalized Pois- 


v=1 

son distribution, and thus w(z) belongs to the infinitely divisible family. More- 
over, if Gi(x) and G2(x) are two generalized Poisson distributions associated with 
U,(a) and U2(a) with parameters a; and az respectively, then G(x) = Gi(x)xGo(x) 
has for f.m.g.f. 

a (Zz) + a 22(2) 1 Y 

ae ei > 
a + a 


and G(x) is again a generalized Poisson distribution function associated with 
the distribution 


wi(Z)we(z) = exp {ax + a») ( 


a, U,(a) + ay U2(a) 
ai + ae 


and with the parameter a; + a2. Thus the generalized Poisson distributions 
form a closed family. The analytic nature of the generalized Poisson distribu- 
tions have been studied by Hartman and Wintner [3]. As noted by Feller [2] 
the various Neyman contagious distributions are generalized Poisson distribu- 
tions. 


U(a) = 


4. Further remarks. From the above observations it is clear that a necessary 
and sufficient condition for a distribution to be a compound Poisson distribution 
is that its f.m.g.f. be of the form 


(1) wi(z) = $(2) 











416 GOTTFRIED E. NOETHER 


where ¢(z) is the ordinary m.g.f. of a non-negative random variable. Likewise q 
necessary and sufficient condition for w(z) to be the f.m.g.f. of a generalized 
Poisson distribution is that it be of the form 


(2) wo (z) es oe, a> 0, 
where Q(z) is the f.m.g.f. of an arbitrary distribution function F(x). If we 
choose ¢(z) = e*"*~” and Q(z) = e%, then wi(z) = we(z), and the distribution 
whose f.m.g.f. is w:(z) (the Neyman contagious distribution of Type A) is simul- 
taneously a compound and a generalized Poisson distribution (ef. Feller [2}), 
We now show that there is an infinite class of distributions with this property. 

First note that if ¢(z) is the m.g.f. of an arbitrary distribution, then exp 
{a((z) — 1)} is also the m.g.f. of a d.f., and in fact is the m.g.f. of the generalized 
Poisson distribution associated with the distribution whose m.g.f. is ¢(z). Now 
jet ¢(z) be the m.g.f. of an arbitrary non-negative random variable, and define 
(3) w(z) = exp{a(d(z) — 1)} a> 0. 
Then w(z) is simultaneously of the forms (1) and (2), since ¢(z) is, by (1), also 
the f.m.g.f. of a distribution function, ie. the compound Poisson distribution 
associated with the distribution whose m.g.f. is ¢(z). However, not every dis- 
tribution which is both a compound and a generalized Poisson distribution can 
be generated in this manner. For example, the Polya-Eggenberger distribution 


is easily shown to be both a generalized and a compound Poisson distribution, 
yet its f.m.g_f. 


w(z) = (1 — dz)", d>0,h>0, 
h 
manifestly is not of the form (3), since this would imply ¢(iz) = 1 — a log 


(1 — diz) isa characteristic function. But | ¢(iz) | is unbounded as z— + © and 
thus is not the characteristic function of a distribution. 
REFERENCES 


{1] H. Cramér, ‘‘Problems in probability theory,’’ Annals of Math. Stat., Vol. 18 (1947), 
pp. 165-193. 

(2] W. Feuer, ‘On a general class of contagious distributions,’’ Annals of Math. Stat., 
Vol. 14 (1943), pp. 389-400. 

[3] P. Hartman anp A. WintNER, “On the infinitesimal generators of integral convolu- 
tions,’’ Am. Jour. of Math., Vol. 64 (1942), pp. 272-279. 


(Ro 


ON CONFIDENCE LIMITS FOR QUANTILES 


By Gortrriep E. NoretTHER 


Columbia University 


In finding confidence limits for quantiles it is usual to determine two order 
statistics Z; and Z; which with a given probability contain the unknown quantile 





ed 


ler 
‘ile 


CONFIDENCE LIMITS 417 


between them. ‘The values of 7 and 7 corresponding to a given confidence coeffi- 
cient can be determined with the help of the distribution laws of order statistics 
as is shown, e.g., in Wilks [1]. The purpose of this note is to determine 7 and j 
with the help of a confidence band for the unknown cumulative distribution 
function. 

In what follows we shall always denote the cumulative distribution function 
(cdf) by F(x), ie., Fv) = P{X < x}. Then the quantile g, is determined by 


(1) F@ — 0) <p < F(Qp) 
which reduces to 
(1’) F(qp) = p 


if F(z) is continuous. Given a sample of size n we can construct the sample 
cdf F,(x) defined by F,,(2) = 1/n (number of observations < x). Confidence 
coefficients will always be denoted by 1 — a. 

Assume that we can construct two step functions L(x) and U(z) parallel to 
F,(x) such that for any fixed value x 


(2) P{L(x) < F(x) < U(z%)} =1—a. 


We do not require that the confidence band determined by L(x) and U(x) cover 
the graph of the unknown cdf F(x) with probability 1 — a, but only that for any 
arbitrarily chosen value x (2) is true. 

Let 


L@)=m, U@)=%& 


for 2 < 2% < 241, k = 0,1, --- , n where z; is the value taken by the order 
statistic Z, and 2 = —®©, 2Zn41 = +. Then if F(z) is continuous it follows 
from (2) that a confidence interval with confidence coefficient 1 — a for gq, is 
given by 


(3) Zi5% <4; 
where 7 and j are determined by 

(4) G:41< Dp, 6; > p 
(5) mni<Pp, 42D. 


It will be noted that (3) represents a half-open interval. However as long as 
we only admit continuous cdf’s the confidence coefficient is not changed if we use 


(3’) ae < dp < Z; 
or 
(3”) Zi SO < 4G; 


instead. This is no longer true if we also admit discontinuous cdf’s. Then the 
confidence coefficient connected with (3’) is <1 — a, while that connected with 








418 GOTTFRIED E. NOETHER 





(3’’) is >1 — a, as follows immediately from consideration of the possible out- 
comes when (1) is true. This is the same result as that obtained by Scheffé 
and Tukey [2]. 

We shall now indicate how 7; and 6; can be obtained and find their values in 
particular case. For any arbitrary value x we can consider F’,,(7) as the sample 
estimat2 of the unknown parameter p = F(x) of a binomial distribution. Clop- 
per and Pearson [3] have discussed how confidence intervals for the unknown 
parameter of a binomial variate can be found. Thus we can determine 7 
and 6; correspondingly, but as is well known (2) cannot be achieved with prob- 
ability exactly equal to 1 — a. We shall have to be satisfied with probability 
>1 — a. Consequently the same will hold true for the confidence coefficient 
connected with the confidence interval for q, . 

In many cases central confidence intervals seem to be more desirable, at least 
intuitively, than others. Our method produces such central confidence intervals 
for the unknown quantile if we use central confidence intervals in the construc- 
tion of the confidence band. In that case 7; and 6; are determined by 


(6) 














= In,(k, n => k - 1) 















(7) = I14,(n — k,k + 1), 


sR 


except that 7 = 0, 6, = 1 by definition, where 
z 1 
1(p, q) = | (1 — t)*" dt | (1 — t)™" dt 
0 0 


is the incomplete beta function. Scheffé [4] has pointed out how the tables of 
percentage points of the incomplete beta function by C. M. Thompson, etc. 
[5] can be used to find 7, and @; . 

We shall show now that in the case of the median M the solution based on 
(3)-(7) leads to the same confidence interval as that suggested originally by W. 
R. Thompson [6]. Thompson found that for k < n + 3 


(8) P{Zi <M < Zorn} = 1 — 2i,(n —k +1, hk) 


provided the unknown distribution had a continuous cdf. (8) can be used to 
maximize k under the condition that the righthand side is >1 — a. 

We shall first show that our method leads to the same kind of a confidence 
interval, i.e., one witht = 1,7 =n —1+1. This follows immediately from the 
fact that by (6) and (7) 


(9) 1 — 60: = mn-1. 
For let 
(10) 
then by (9) mn-1 < 3 and ma-i41 > 3. 










6:1 < 3 and @; > 3, 












A LOWER BOUND 419 


It remains to be shown that k as determined by (8) equals 1. This will be so 
if we can show that 


(11) Kn -1+1,0 S5<hm—-L1+ 0). 


Remembering that J.(p, q) is a monotonically increasing function of x we get 
with the help of (7) and (10) 
5 = hoa —1+1) > ha —-1+1,0 
and 
© 
_= 
which proves (11). 
In conclusion it may be worth while pointing out that the formula 


P{Z: < dp < Z;} = In,n —¢ + 1) — 1,G,n —j + 1) 


given, e.g, in Wilks [1] for the continuous case can be obtained by a slight modi- 
fication of (6). 


Tio (mn —Li+1) <inm—-114+ 1) 


REFERENCES 

[1] S.S. Wiiks, ‘‘Order statistics,’ Am. Math. Soc. Bull., Vol. 54 (1948), pp. 6-50. 

[2] H. Scuerré anp J. W. Tuxey, ‘‘Non-parametric estimation. I. Validation of order 
statistics,’’ Annals of Math. Stat., Vol. 16 (1945), pp. 187-192. 

[3] C. J. CLoprer anv E.S. Pearson, ‘‘The use of confidence or fiducial limits illustrated 
in the case of the binomial,’’ Biometrika, Vol. 26 (1934), pp. 404-413. 

[4] H. Scuerr®é, ‘‘Note on the use of the tables of percentage points of the incomplete beta 
function to calculate small sample confidence intervals for binomial p,’’ Biomet- 
rika, Vol. 33 (1944), p. 181. 

(5) C. M. THompson, E. S. Pearson, L. J. Comrie, anp H. O. Harttey, “‘Tables of per- 


centage points of the incomplete beta function,’’ Biometrika, Vol. 32 (1941), pp. 
151-181. 
[6] W. R. THompson, ‘‘On confidence ranges for the median and other expectation distri- 


butions for populations of unknown distribution form,’’ Annals of Math. Stat.. 
Vol. 7 (1936), pp. 122-128. 


RI a 


A LOWER BOUND FOR THE EXPECTED TRAVEL AMONG m RANDOM 
POINTS 
By Eur S. Marks 


Bureau of the Census 


In connection with cost determinations in sampling problems, it is frequently 
necessary to determine the amount of travel among m random sample points in 
an area. A lower bound for the expected value of this distance is found to be: 


(<2 
2 m ; 





420 ELI S. MARKS 


where A is the measure of the area from which the m random points are 
drawn.” 

If in a finite area S we locate m points at random (see Figure 1), we can trace 
a continuous path among the m points by starting at some point and connecting 
the points by line segments. The points can be connected in any order so that 
the path touches each point only once (unless it intersects itself at one of the 
random points). We are interested in a lower bound for the expected value of 
the length of the shortest of the m! possible paths. 


rn 


Fic. 1. m Ranpom PorntTs IN S. 


We have above an area S in which m random points have been selected (with 
m = 14). 

The shortest path among the m points consists of m — 1 “‘links”’ (line segments) 
between two points. Each link can be assigned to one of its end points, leaving 
some pre-designated point (e.g., the m-th point selected) with no link assigned. 
The link assigned to the i-th random point (2z::)) must be no less than 7;;) the 
distance from 2,;) to the nearest of the other (m — 1) points. If we denote the 
jength of the shortest path by L: 


m—1 


L _~ 2. Ti), 


i=1 


E(L) > > E(ri). 


Let E,(r,) be the expected value of r,;) conditional upon 2;;) falling at the 
point x in S and let F(r | x) be the conditional distribution function of 7, for 2) = 
x. Thus F(r | x) is the conditional probability of r;;, < r or the probability of 


1 The lower bound obtained is similar in form to the expression for distance traveled 
among a set of random points used by Mahalanobis [2] and Jessen [1]. 





A LOWER BOUND 421 


one or more of the (m — 1) random points other than 2,,, falling inside a circle, 
C,, with radius r and center at x (see Fig. 1). Then, we have: 


+00 
E; (rc) = | dF (r |x), 
0 


( Y ey m—1 
re|a <1 — (MS) = Msea\"" 


M(S) 


M(SC,) .. 


where M(S) and M(SC,) are the measures of S and SC, , so that ~ M(S) is the 


probability of a random point in S falling into C, . 


Let A = M(S) and construct a circle C with center at x and radius p = / 4 . 
T 


Then M(C) = A = M(S). Let d be the distance from x to the nearest of 
(m — 1) points selected at random from C and let G(r) be the distribution func- 
tion of d. Then we have: 


E(d) | rdG(r), 
0 


( 
| 


\ m-1 
— je - uccc,)\"* 


G(r) = 


For r < p, 
M(C,C) = M(C,) > M(SC,). 
For r > p, 
M(C,C) = M(C) = M(S) > M(SC,). 
Thus, since M(C,.C) > M(SC,), we have for all x in S: 
G(r) > F(r | 2), 
and thus, 
E(d) < E.(rw). 
Since E(d) < E,(r,;) for all x in S: 
E(d) < E(rw), 
(m — Bd) < ¥ Blew) < BW. 
It only remains to evaluate E(d), the expected distance from the center of a 


circle to the nearest of (m — 1) random points. This can be done very easily 
by substituting in the expression for G(r): 


A = M(C), 


ee 1 J ; a /A 
ar = M(C,C), when r < p = V/ = 















H. M. BACON 


G(r) 





f ? 


A ) 


_— 4 — ln" 





’ 


7.) _ 2ar - (A — mr \" 
Gr) = > (m 0) 


[omar = 5 4/4 Bom, 9, 


where B(m, 3) is the complete Beta function. 


Since /m [B(m, 3)] > Vz: 


1 A 
ae Sie 
E(d) Z /4 


. —_ mM 1 
E(L) >4VWA a 


m 


4 


E(d) 


Thus, we have: 


















It is obvious that the development is general and applies to m random points 
in any bounded two-dimensional Borel set. However, the lower bound ob- 
tained will, in general, be useful only when S is a connected region. 


REFERENCES 


[1] Raymonp J. JESSEN, “Statistical investigation of a sample survey for obtaining farm 
facts,’’ Iowa State College Research Bulletin 304 (1942). 

[2] P. C. Manatanosis, “A sample survey of the acreage under jute in Bengal,’’ Sankhya, 

Vol. 4 (1940), pp. 511-530. 


an ne 


A MATRIX ARISING IN CORRELATION THEORY! 
By H. M. Bacon 


Stanford University 


1. Introduction. In the study of time series, it is frequently desirable to 
consider correlations between observations made in different years. Let ra, 
Xi2,°** , Lim be m values of the variable x; , expressed as deviations from their 
arithmetic mean, where 7; is a variable observed in the ith year (¢ = 1,2, ---,n). . 


1 A linear correlogram is considered by Cochran in his paper, “Relative accuracy of sys- 
tematic and stratified random samples for a certain class of populations,” (Annals of Math. 


Stat., Vol. 17 (1946), pp. 164-177) in which p, = 1 — r Setting u = |i — j7|andZ = 1/p, we 


have the case considered above. 


A MATRIX 423 


Let o; be the standard deviation of x;. If we denote by r;; = rj; the correla- 
tion of 2; with x; , and if we assume the 2; to be normally distributed, then 


1 4 1 . Rij Xj 2; 
a (2r)"”” o1 02 a ie VR exp { 2 2 2, Roi; 


is the frequency function giving the distribution. Here FR is the determinant 
|r;;| of the correlation coefficients, and f;; is the cofactor of the element rj; 
in this determinant. 

We may make various assumptions regarding the behavior of the correlation 
coefficients over the n years. One such assumption of some interest is that the 
correlation coefficients diminish in such a way that 


n 


r= Ty = 1—(|t —j |p 


where p is a fixed positive number not greater than 2/(n — 1). Under these 
circumstances, we can evaluate R and F;; in terms of n and p. 


2. Evaluation of R. We may let R(p) represent the determinant FR of order 
n whose element in the ith row and jth column is 74; = rj; = Tr-in-j = 
ta-jn-i = 1 — |t — j| p where, for the purpose of evaluation, p is any real 
number. Since each two-rowed minor of R(p) is divisible by p, R(p) is divisible 
by p” . Furthermore, since R(p) is a polynomial in p of degree at most n, we 
have 


R(p) = Ap” + Bp”” = p™ (Ap + B). 


If we set p = 1 and p = —1, we find A + B = R(1) and R(—1) = (—1)"" 
(-A + B) so that —A + B = (—1)”"R(-1). By elementary methods we 
find that R(1) = 2”°(3 — n) and R(—1) = (—1)""2""(n + 1). Hence 
A+B = 2"°(3 — n) 
and 
—-A+B=2”°(n+ 1). 
Solving for A and B we find that 
R = R(p) = 2”"p" [2 — (n — 1)p). 
3. Evaluation of R;;. Similar methods yield the following values for the 
cofactors R;; of the elements of R: 
Ru = Ran -? [2 — (n — 2)p), 
Re» = Rss = Ryn = 2” eo [2 — (n — 1)p), 
Rin = Rn gray, 
Riis 7 —2"*p"[2 — (n — 1)p), 
otherwise, 


R;; = 0. 








424 W. J. DIXON 


4. The frequency function. The quadratic form appearing in the exponent 
in the expression for the frequency function can now be written as 


n 


“Ryxitj 2—(n — 2p (3 “4 
2 2 Ro; o; 7 2p[2 — (n — 1)p] of + a2 


1/23 3 ee 
+ (24+84 — + #1) 
P\92 G3 Ca% 
1 Tite Ts *) 
+ 2[2 —(n — 1)p] (2 t on G1 
-3 (2 te 4 Mat,  Mety , tote y tafe) 
Pp 


102 0201 0203 03 02 On On-1 
ee 2 2 n—1 2 a 
“dan ada tea 2 
1 1 2n 
tenga 
5. Maximum likelihood. The expression z is the likelihood of getting a 
particular set of values of the variables 7, 22, ---,2%n. It is often important 


to regard the r;; and the o; as parameters and to determine them so that the like- 
lihood will be a maximum. If we assume o; = o2 = -+: = on = a, then 








1 lx - R;; U5; Xj 

2 = 7D? on. 75 CXp<— = ———’ >. 

(20)? a/R “*P { 2 2d 2d Ro? 

The question, in our case, now becomes, What values of p and o will make z 
. , ; ‘ ws dz dz 

a maximum for given x; ? Necessary conditions are that —* 0 and _* 0. 


Since R;; and RF are given in terms of p, the process of differentiation can be carried 
out (first take the logarithm of z), and values of p and o necessary for a maximum 
determined. It is, of course, possible that z has no maximum, and the sufficiency 
of these values must be tested. The computations for the general case are 
laborious, though straightforward. Furthermore, because of the complicated 
nature of the coefficients in the equation to be solved for p, the general solution 
is not readily obtainable. This equation is, however, of third degree, and it can 
be solved in any particular case. 


OO 


TABLE OF NORMAL PROBABILITIES FOR INTERVALS OF VARIOUS 
LENGTHS AND LOCATIONS 


By W. J. Drxon 
University of Oregon 
1. Introduction. The probability associated with a particular finite range of 


z 
values is often desired. The usual tables of normal areas gives values for | or 
0 





ee 















nent 


za 
fant 
ike- 


ce 2 


‘ied 
um 
ncy 
are 
ted 
ion 
an 


US 


TABLE OF NORMAL PROBABILITIES 425 


zr 


as in the table by Salvosa tu, | : 


— 6 


The WPA table [2] gives | . The author 
zr+}l aw 
has deposited with Brown University a table of for values of x[0(.1) 5.0] 


z—}1 

and values of /|0(.1) 10.0). The values in the table may be interpreted as the 
probability that an observation from a normal population with unit variance 
will fall in an interval of length 1 whose midpoint is a distance x from the mean. 
These values can be obtained by a simple computation from the existing tables. 
Since values were being used frequently, the present table was constructed. 
Microfilm or photostat copies may be obtained upon request to the Brown 
University Library. 


2. Computation. The values were obtained by finding the difference between 


z—4l z+4hl 
the integrals | and I as given to six decimal places in Salvosa’s table. 


Being differences, the values are subject to an error of 1 unit in the sixth place. 

For values of x + 31 greater than 5, the values can be obtained by computing 
r—3l 

1- [ . The search for errors was aided by computing column sums; i.e. 


v— 2% 


50 zp+hi 31 
: 
(1) Lf +5f =52, 
imi Jas—$l 2 Jw 


where 7 represents the row number and n represents the column number. For 
example, n = 17 corresponds tu column for! = 1.7. The approximation becomes 
poorer as n increases but the sums were still useful for checking purposes. 


3. Example. The table has been used in studies of the expected proportion 
of a line covered by intervals dropped on it according to some normal probability 
function. Let P,(x) be the probability that the point x is covered at least once 
when n intervals are dropped on the z-axis. H. E. Robbins [8] gives the ex- 
pression : 


F 1 . - x 
(2) E(F) = + l Pied ile. 


for the expected proportion of a line of length L covered at least once by these 
intervals. 

Let f(x) dx be the probability that an interval falls with its center in dz and | 
be the length of the interval. The probability that a point x will be covered by 
one interval dropped on the z-axis is: 


z+hl 
(3) ge) =f fOr at 
z—4l 


2 


When n intervals are dropped, the probability that x is covered at least once is’ 


(4) P, (t) = 1 — (1 — g@))", 








426 G. E. ALBERT 

and 
1 L 

(5) E(F) =1—1 [ ft — gti" de. 
L Jo 


When k groups of n; intervals are dropped according to, say normal distributions 
with different means, 


. 

(6) P(r) = 1 — [] (i — g,(z))”. 
i=l 

Where 
z+hl 

(7) g(t) = ff at 
2—il 


and we obtain 
1 Lt 

(8) E(F) =1-—- f Ila - gi(x))”* da. 
L Jo tai 


The values g(x) are those given in the table and are useful in evaluating the 
integrals in (5) and (8) by numerical methods. 


REFERENCES 
[1] Luis R. Satvosa, “Tables of Pearson’s Type III functions,” Annals of Math. Stat., Vol. 
1 (1930), p. 191. 
[2} NatronaL Bureau or Stanparps, Tables of Probability Functions, Vol. 2 (1942). 
[3] H. E. Rossrns, “On the measure of a random set,” Annals of Math. Stat., Vol. 15, (1944). 


a eR 


CORRECTION TO “A NOTE ON THE FUNDAMENTAL IDENTITY OF 
SEQUENTIAL ANALYSIS”’ 


By G. E. ALBERT 
University of Tennesse 
In the paper cited in the title (Annals of Math. Stat., Vol. 18 (1947), pp. 593- 
596), the proof of Lemma 3 is incorrect. The following correct proof is due to 


Mr. C. R. Blyth of the Institute of Statistics, University of North Carolina. 
It is easy to establish the equation 


P(n = N|Fyle(t)-” = P(n = N\@)E,-wlexp(—tZy) |G), 


where E,-x(u|G) denotes the conditional expectation of u under the condition 
that n = N for any fixed integer VN. By Wald [2], equations (2.4) and (2.6), 
there exists a finite constant C independent of N which dominates the expected 
values E,-x[exp(— tZy)\G] for every N. Thus 


(A) ‘ P(n = N\F)[e(b)* < C-P(n = NIG). 





CHARLIER TYPE B SERIES 427 


By Stein’s theorem [3], there is a positive number é,; such that E(exp nt:|G) is 
finite. But by (A), 


Ef{exp n{ti — log g(t)]} < C-E(exp nt|G), 


and Lemma 3 is proved. 


RR a a 


CORRECTION TO “ON THE CHARLIER TYPE B SERIES’”’ 


By S. KuLiBack 
George Washington University 
In the paper cited in the title (Annals of Math. Stat., Vol. 18 (1947), p. 575), 


the phrase ‘‘so that ... &, > 1” on lines 5 and 6 should be deleted. I am grate- 
ful to Prof. Ralph P. Boas, Jr. for calling this to my attention. 




































































ABSTRACTS OF PAPERS 
Presented June 22-24, 1948 at the Berkeley Meeting of the Institute 


1. Estimation of Parameters for Truncated Multinormal Distributions. Z. \y. 
BirRNBAUM, E. PAuLson and F. C. ANDREws, University of Washington. 


Let Xiyy = (X1,-+: Np, NXpyi, +++ , Xy) be an V-dimensional random variable with a 
non-singular normal distribution, and let the expectations, variances and covariances of 
Npyt,ct: Ny be known. A large sample of X(y) is available, obtained under some side- 
condition on (XNp41, °°: , Vy); this side-condition may be a truncation of any kind or, more 
generally, a selection; i.e. imposing on Xp.1, °°: , Xy a probability-distribution different 
from the original marginal distribution. A method is developed for estimating, from such a 
large sample with a side condition, all the missing parameters of the original distribution of 
NX y) , that is the expectations, variances and covariances of X,,--- ,Xp,, and the co- 
variances 0X ,X, forj =1,--- , pandk = p+1,--: ,N. This method does not require the 
knowledge of the side-condition. (This paper was prepared under the sponsorship of the 
Office of Naval Research.) 


2. A Test of the Hypothesis that a Sample of Three Came from the Same Normal 
Distribution. Cart A. Bennett, General Electric Company, Hanford 
Works, Richland, Washington. 


In the control of the precision of chemical analyses performed in duplicate, a test some- 
times becomes necessary as to whether three determinations can reasonably be assumed to 
have arisen from the same normal population. A critical region for testing this hypothesis 
is given by R > Ro , where R = D/d, D being the maximum and d the minimum difference 
between the three values, and Ry is determined by integration over the upper tail of the 
Cauchy distribution. It can casily be seen that this test is equivalent to a f-test between a 
sample of one and a sample of two. 


3. A Note on the Application of the Abbreviated Doolittle Solution to Non- 
Orthogonal Analysis of Variance and Covariance. Cart A. Bennett, 
General Electric Company, Hanford Works, Richland, Washington. 


S. S. Wilks has shown that the sums of squares necessary to the tests commonly made in 
non-orthogonal analyses of variance or covariance can in general be reduced to the satio of 
two determinants. If several determinantal operations are performed to remove the 
singular principal minors, the abbreviated Doolittle solution yields these sums of squares 
directly. A combination of this technique and the calculational methods advanced by 
Wald and Yates greatly reduces the tedium of calculation in this type of analysis. 


4. Yield Trials with Backcrossed Derived Lines of Wheat. G. A. Baker and 
F. N. Briaas, University of California, Davis. 


Strains of White Federation 38 and Baart 38 Wheats derived by backcrossing sufficient 
to insure a high degree of homogencity for all genetic factors were grown in conventional 
yield trials. The results were somewhat contradictory and led to a critical examination of 
such trials. The assumption that the deviations of yields in field trials from the specified 
pattern are random with uniform variance and expectztion zero is not sufficiently realistic. 
We are led to consider a mathematical model which assumes a set of fertility levels upon 


428 








1. W, 


vith a 
ces of 
Side- 
more 
erent 
uch a 
ion of 
le ¢o- 
re the 
of the 


rmal 
iford 


some- 
ed to 
thesis 
rence 
f the 
echa 


Non- 
‘ETT, 


de in 
tio of 
» the 
uares 
d by 


and 


cient 
ional 
on of 
ified 
istic. 
upon 


ABSTRACTS OF PAPERS 429 


which a random element is superimposed. On the basis of this model it is possible to 
account for the low observed correlations between residuals and plot yields. In such a 
model the variance ratio F may be approximately unbiased but then its variance is smaller 
than under conventional assumptions. On the other hand, the expected value of F may be 
greater than one and sufficiently large so that ‘‘significant differences’’ between strains will 
always be found due to the differences in fertility levels. In such cases the results of the 
experiment may be misinterpreted. Transformations, in the ordinary sense of the word, 
will not bring such data into conformity with the conventional model. In order to bring the 
correlation between residuals and plot yields down to a sufficiently low level it is necessary 
to concentrate most of the variation in fertility levels into a few plots. That this is not 
unreasonable is borne out by agronomic observations. This model also explains the 
absence of correlation between the yields of strains as determined in two different trials on 
the same set of strains. 


5. The Selection of the Largest of a Number of Means. CHARLES M. STEIN, 
University of California, Berkeley. 


Suppose X;;,7 = 1,--- ,p;j = 1, 2, +--+ are independently normally distributed with 
means £; + 7; and variances oj where &; , 7; are unknown but oj are known. ¢, @ are fixed 
numbers with0O < «,0<a<1. Itis desired to select, by a sequential procedure, in which 


we take first the observations with second subscript 1, ete. an integer WM among 1, --- , p 
such that, forevery k = 1,---,pand &,--- &),m,m,-:: satisfying & 2 £;+ efor all 


j4#k,P\|M =k} 21—a. In accordance with the following rule, one decides at each stage 
(after the observations with second subscript n) to take no more observations with certain 





first subscripts. For each n = 1, 2,--- and eachl = 1,--- , p compute 
P = r V e(t; ay 1) 
ohn, ~~ 
j=l Oj t; 


where X ; is the average of the observations with second subscript j and ¢; is the number of 


such observations. Continue taking observations X),,.; --- for those l for which this 
expression is greater than (Ina) /e but not for the others. Eventually there will be at most 
one subscript / = 1,--- , pfor which one continues to take observations and if there is one 


this is chosen to be MW. If there is none, the /for which the sumis largest is chosen to be M. 
This procedure is a straight-forward application of the Lemma on p. 146 of Wald’s Sequential 
Analysis, and generalizations can easily be found. 


6. The Effect of Inbreeding on Height at Withers in a Herd of Jersey Cattle. 
W. C. Rotiins, 8. W. Mean, and W. M. ReGan, University of California, 
Davis. 


The data consist of measurements of height at withers of about 200 females for various 
ages from one month to five years. The intensity of inbreeding as measured by Wright’s 
coefficient of inbreeding averaged 15 per cent and reached as high as 44 per cent in a few 
cases. 

An intra-sire covariance analysis of height and per cent of inbreeding was made for 
various ages from the first month to the fifty-fourth month. 

The results of the statistical analysis indicate that the inbred animals are shorter at one 
month of age and grow more slowly up to about the sixth month than do the outcrossed 
animals, but that from the sixth month on the inbreds begin to catch up with the outcrossed 
so that at maturity there is no significant difference in height. 





















































































430 ABSTRACTS OF PAPERS 


7. An Example of a Singular Continuous Distribution. HErnry ScHEFFf, 
University of California at Los Angeles. 


Simple and ‘‘natural’’ examples of singular continuous probability distributions are of 
pedagogical interest. They are trivially available in the k-variate case fork > 1. A 
univariate example may be obtained from the notion of a sequence of independent trials of 
an event with constant probability p of success, a notion familiar to the student and indis- 
pensable in elementary probability theory. The (real-valued) random variable X is taken 
to be the dyadic representation of the sequence of results (1 and 0, respectively, for success 
and failure). It is known that X has a singular continuous distribution for p ¥ 0, ‘1. 
This result may be proved by using only the Tchebycheff inequality together with the 
formulas for the mean and variance of the binomial distribution? 


8. On the Theory of Some Non-Parametric Hypotheses. Ericu L. LeEnmMann 
and CHARLES STEIN, University of California, Berkeley, California. 


For two types of non-parametric hypotheses optimum tests are derived against certain 
classes of alternatives. The two kinds of hypotheses are related and may be illustrated by 
the following example: (1) The joint distribution of the variables X; , ++: ,Xm,Yi,-::, 
Y, is invariant under all permutations of the variables; (2) the variables are independently 
and identically distributed. It is shown that the theory of optimum tests for hypotheses 
of the first kind is the same as that of optimum similar tests for hypotheses of the second 
kind. Most powerful tests are obtained against arbitrary simple alternatives, and in a 
number of important cases most stringent tests are derived against certain composite 
alternatives. For the example (1), if the distributions are restricted to probability densi- 
ties, Pitman’s test based on 7 — is most powerful against the alternatives that the X’s and 
Y’s are independently normally distributed with common variance, and that E(X;) = &, 
E(Y;) = » where 7 > &. If» — — may be positive or negative the test based on | 7 — Z| 
is most stringent. The definitions are sufficiently general that the theory applies to both 
continuous and discrete problems, and that tied observations present no difficulties. It is 
shown that continuous and discrete problems may be combined. Pitman’s test for example, 
when applied to certain discrete problems, coincides with Fisher’s exact test, and when 
m = nthe test based on | 7 — Z | is most stringent for hypothesis (1) against a broad class of 
alternatives which includes both discrete and absolutely continuous distributions. 


9. Concerning Compound Randomizationinthe Binary System. JoHN E. WALSH, 
Project Rand, Santa Monica, California. 


Consider a set of binary digits. The numerical deviation from } of the conditional 
probability that a specified digit equals 0 is called the bias of that digit for the given condi- 
tions on the remaining digits of the set. The maximum bias of the set is defined to be the 
maximum of the biases of the digits of the set. A set of binary digits is called random if its 
maximum bias is zero. Now consider an array of (1 + t:) --- (1 + tx) X n binary digits 
such that the rows are statistically independent. A compounding method of obtaining a 


set of t; --- txn binary digits from the original array is presented. By suitable choices of 
K,t:, +--+ , tx the maximum bias of the compounded set can be made extremely small even 
if the maximum bias of the original array is not small; this can be done so that f; «++ tx/ 
(1 + t:) --- (1 + tx) is moderately large. Also a method is outlined for constructing an 


approximately random binary digit table. This table has the property that the maximum 
bias of a set of digits taken from the table is an ircreasing function of the number of digits 
in the set. 








“FFE, 


tre of 


als of 
ndis- 
aken 
ccess 
4,1. 
1 the 


ANN 


USH, 


onal 
ndi- 
the 
fits 
gits 
ig a 
s of 
ven 
tx/ 
ran 
1um 
gits 


ABSTRACTS OF PAPERS 431 


10. A Multiple Decision Problem Arising in the Analysis of Variance. Epwarp 
PauLson, University of Washington, Seattle. 


In some applications of the analysis of variance, a procedure is required for classifying 
varietiesinto‘ superior’ and ‘inferior’ groups. Consider K varieties, with za the a observa- 


Tr 
tion on the 7 variety (a = 1,2, --- r;7 = 1,2, --- K), let Z = 2 Zia/r and let s? be an 


independent estimate of the variance. For the i“ variety form the corresponding interval 


ds AS , ; : , 
(+ _ fe i+ ~) . The superior group then consists of the variety with greatest 
r r 


sample mean, together with those varieties whose corresponding intervals have at least one 
point in common with the interval for the variety with the greatest mean. If all varieties 
fall into one group, this group is labeled ‘neutral’ and the varieties are considered homoge- 
neous. To select A, consider the relative importance of different incorrect classifications. 
For a given X, an explicit expression is found for P(A), the probability the varieties will not 
all be classified in one group when m; = m2: = --+ = mx where m; = E(#;); also explicit ex- 
pressions are found for P(B,) and P(Bz), where P(B,) is the probability there will not bea 
superior group consisting only of the Kth variety and P(B:) is the probability there will 


not be a superior group consisting of at least the Ath variety, when m, = m2 = +++ = mpi = 
mand m = m+ A(A > 0). Similar results are obtained for classifying K processes 


according to their variances. 


11. Recurrence Formulae for the Moments and Semi-variants of the Joint 
Distribution of the Sample Mean and Variance. OLav REIERSGL, University 
of Oslo, Norway. 


Let 2 , %2, °°: , 2n be independent and having the same distribution. We consider the 
arithmetic mean m and the variance v = (1/(n — 1)) J (2; — m)?. Let «-, denote the semin- 
variants of the joint distribution of m and v, and let the seminvariant generating operators 
K be defined by the equations: kr413 = Aikrs , Krs41 = Kokrs K;.1 = 0, Ki (PQ) = P(KiQ) + 
Q(K;:P). An operator which operates only on the first factor of a product shall be 
denoted by a prime, and an operator which operates only on the second factor shall be 
denoted by a double prime. We have the following general formula, valid for any parent 
distribution: Kj[(m — 1)(Ke + «, + «y,) — 2n(Ky + wjo)(Ky’ + «y6)]*(1- (or — moo) + 
m(ki0-K1i0 — 1-«,9)] = 0. Fors = 0, 1, 2, we obtain the formulae, Kj(«o. — nx) = 0, 
Kil(n —- 1) (koe —_ NK21) = 2nxso] = 0, Kil(n - 1)? (Koz -— fin) 8n2(n — 1)k21Kk20 + 4n3(n ~~ 1) 
K39 — 8n3(n — 1)xsol = 0. 


12. The Problem of Identification in Factor Analysis. OLav REIERS@L, Uni- 
versity of Oslo, Norway. 


The paper is concerned with the multiple factor analysis of L. L. Thurstone. Thurstone 
has given criteria which he says are almost certain to constitute sufficient and more than 
necessary conditions for uniqueness (i.e. identifiability) of a simple structure. It is shown 
that Thurstone’s criteria are not always sufficient, and conditions are derived which are 
more nearly necessary and sufficient for the identifiability of a simple structure. Let A 
be the matrix of factor loadings with n rows andr columns. When the communalities are 
identifiable, the conditions will be: (1) Each column of A should have at least r zeros. 
(2) Let us consider the submatrix B of A, consisting of all the rows which have zeros in the 
kth column. Then, for g = 1,2, ---,7 — 1, there should for any combination of g columns 
different from the kth, exist at least gq + 1 rows of B containing non-zero elements in the q 
columns. This should be true for any value of k. 








432 ABSTRACTS OF PAPERS 


13. Note on Distinct Hypotheses. AGnrs Bercer, Columbia University, New 
York. 





As was pointed out by Neyman, one of the difficulties which one may encounter when 
devising a test to distinguish between two exhaustive and exclusive composite hypotheses 
referring to the unknown distribution of a random vector X is the following: If Hp states 
that the true distribution function of XY belongs to a set {Ff} and H, that it belongs to a 
set {@},it may happen that to every Borel set W of the sample space there exists an element 
Fw in {F} and an element Gy in {@{ for which the probability of the sample point z falling 
on W is the same and therefore independent of whether //, or H, is true. If this is the case 
the pair Hy , H, is called non-distinet, otherwise they are called distinct. The existence of 
non-distinct hypotheses is demonstrated by a simple example, //o consisting, of one, H, 
of three suitably chosen stepfunctions. It is shown however that if the sets {F} and {@ 
contain only continuous distribution functions and are at most enumerable then the pair 
H, , H, is distinct. Necessary and sufficient conditions for Ho and H, to be distinct were 
obtained jointly with Wald for an important class of hypotheses each containing a con- 
tinuum of alternatives. 









14. Place of Statistical Sampling in the Education of Engineers. 
Stanford University. 


I. L. Grant, 








There is convincing evidence that many enginecring problems could be solved better 
with the aid of statistical methods than they are now solved without this aid. However, 
few practising engineers or teachers of engineering have had any training in statistical 
methods. As a result, those engineering problems which are in part statistical problems 
are seldom recognized as such. Even in the field of industrial quality control, in which 
successful applications of some of the simpler statistical techniques have been made in 
many different industries, the surface has barely been scratched and a serious obstacle to 
progress is the lack of a widespread appreciation of the statistics point of view among 
design engineers, production engineers, inspection personnel, and management. 

This condition might gradually be corrected if during the next few years instruction in 
statistics should be introduced into all undergraduate engineering curricula. However, 
some recent discussions touching on the subject of statistics instruction for engineering 
students (e.g., the report on ‘‘The Teaching of Statistics’? which appeared in the March 
1948 issue of the Annals of Math. Stat.) have been most unrealistic regarding the amount of 
statistics instruction which could be added to engineering curricula. These discussions 
have suggested a full year of basic statistics followed by one or more courses in engineering 
applications. Desirable as this arrangement might be from the point of view of the most 
effective instruction in statistics, it is out of the question when considered in the light of 
the many subjects which are needed in engineering curricula. Although undergraduate 
engineering curricula have always been tighter than other curricula, the pressures today 
are greater than ever before—for more time devoted to the humanistic-social stem, for 
more time in basic mathematics and science, for introductory courses in various economic 
and management subjects such as engineering economy, accounting, industrial relations, 
business law, and industrial organization and management, and for more time in the various 
departmental courses in engineering subjects in order to permit presentation of important 
recent developments in engincering technology. Under these circumstances the most that 
-an be hoped for in the undergraduate program is a single statistics course for one term, 
possibly three units for one semester or four units for one quarter. This should be supple- 
mented by additional statistics instruction for some graduate students in engineering. 
A few engineering graduates should be encouraged to take graduate degrees in statistics 
and to make careers in the field of applied statistics. 













































mam OF VS oft YY BSB DB ww 


“= =~ 


ABSTRACTS OF PAPERS 433 


In a successful undergraduate statistics course for engineering students, the problems 
and illustrations should be selected with two purposes in mind. One purpose, of course, 
should be to develop the principles of probability and statistical method. The other, 
equally important if these engineering graduates are to persuade their colleagues and 
superiors to adopt the statistics point of view in approaching engineering problems, should 
be to demonstrate how statistical method provides a useful guide to action in many different 
engineering situations. Applications of statistics to industrial quality control provide 
particularly good problems and illustrative examples which serve this second purpose. 


15. Statistical Problems of Medical Diagnosis. Jerzy NeyMan, University of 
California, Berkeley. 


“Diagnosis” is used to describe the outcome of a strictly defined test 7’, such as Wasser- 
mann test, which may lead to either of two possible outcomes, ‘‘positive’’ or ‘“‘negative’’. 
Cases contemplated are such that at the time the test T is performed it is impossible to 
verify its verdict for certain and the best one can do is to repeat the test. It is postulated 
that to each individual of a population there corresponds a probability p that the test 7' 
will give a positive outcome. The value of p may vary from one individual to another. 
It is presumed that as p increases, the illness in the patient in. ases. Problem of compari- 
son of two alternative tests and problem of estimating the a» -.. "tion of p reduces to 
problems relating to the distribution of X = number of positive outcomes in n independent 
diagnoses. Statistical machinery suggested is that of BAN estimates (Public Health 
Report, Vol. 62, (1947), p. 1449). Principal result reported is that, with the mathematical 
model used in the paper quoted, the empirical variances of four BAN estimates computed 
for 205 samples of 1000 elements each agreed reasonably with the theoretical asymptotic 
values. Empirical distributions of three of these estimates did not show deviations from 
normality. That of the fourth was non-normal. It seems therefore that the asymptotic 
procedure of BAN estimate may be adequate for similar analyses. 


16. Power of Certain Tests Relating to Medical Diagnosis. C. L. CH1anc and 
J. L. HopeGss, Jr., University of California, Berkeley. 


Associate with each individual in a population z the probability p that he will be found 
tubercular when examined by a standard X-ray technique. Yerushalmy and others [J. Am. 
Med. Assn., Vol. 138, (1947), p. 359] performed 5 independent such diagnoses on each of 1256 
persons. Neyman [Public Health Reports, Vol. 62, (1947), p. 1449] proposed a simple four- 
parameter model for the distribution of p in z, estimated the parameters from the data of 
Yerushalmy and others, and obtained a satisfactory fit. In the present paper, the work of 
Neyman is paralleled with four new models, all giving satisfactory fit with the same data. 
The five models differ considerably in shape, and in the number of repeated diagnoses which 
they indicate to be necessary to detect a high proportion of those individuals having, say, 
p20.1. Therefore further preliminary study seems indicated before one can design a mass 
survey to detect a high proportion of such persons. The approximate power of the x? test 
of the Neyman model is considered, using one of the other models as alternative. It is 
found that to obtain power 0.7 with level of significance 0.05, it would be necessary to diag- 
nose 5290 individuals 5 times each. 


17. Iterative Treatment of Continuous Birth Processes. T. E. Harris, 
Project Rand, Santa Monica, California. 


Random variables z, are defined by zo = 1; P(zi = r) = pr, r = 1,2, °°+ 51 


Zn41 1s the sum of k independent variates, each distributed like z;. Let 2 = 








434 ABSTRACTS OF PAPERS 


nD 


z= rp, < ©;0< pi <1. The generating function f(s) = = p;s’ is said to be C.L. if there 
1 1 









exists a family of generating functions f(s, t) with f(s, 1) = f(s), f[f(s, t), t’| = f(s, tt’) for al 
nonnegative ¢ and t’. A necessary and sufficient condition that f(s) be C.I. is that the 
numbers a; ,r = 2,3, --+ , be nonnegative; the a, are determined recursively by requiring 


« 


that the power series {(s) = —s + = a,s’ satisfy formally the functional equation £(s)f’(s) = 


é[f(s)]. The problem is connected with classical works on iteration. If f{s) is C.I., the 

given Markoff process can be imbedded in a continuous birth process. If &(s) is given, the 

m.g.f.¢(s) of the asymptotic distribution of the variate z,/z" may be determined from the 
* 7 e"(1) 1 ; . 

formula ¢7!(s) = (s — 1) exp 7 dy +, Various properties of the cor- 
1 | &y) i~ s ) 

responding distribution can be inferred from this expression. 






















18. Estimation of Means on the Basis of Preliminary Tests of Significance, 
Buair M. Bennett, University of California, Berkeley. 





This paper examines the statistical procedure of pooling two sample means on the basis 
of the results of one or more preliminary tests of significance. Let x; , (¢ = 1, --- ,™,), 
represent a sample of N, observations from a normal population (é, a1); and y; a sample of 
N2 observations from na(nyos ). Anestimate of — which iscommonly used in certain practical 
situations is given by: x’ = #, or 2’ = (Nit + Nog)/(Ni + Ne), according as the sample 
means #, 7 do or do not differ significantly on the basis of a preliminary test. The distribu- 
tion of the estimate zx’ is determined, according as 0, = o2 are known or unknown. In both 
situations, the maximum (or minimum) bias is computed as a function of various levels of 
significance of the preliminary test of equality of means. Also, the mean square error of 
the estimate x’ is calculated in both cases. If now equality of variances cannot be assumed, 
but an F-test of the sample variances Si; , 8, does not indicate any significant difference, 
then in practice #, 7 may be pooled, the weights being inversely proportional to the sample 
variances. Thus, the usual estimate of — will be of the form: 2’ = @, or 2’ = (Niz/s) o 
Nog/S3)/(Ni St + N2/S2), according as £ and g do or do not differ significantly on the 
basis of the Student t-test, subsequent to an F-test. The bias and mean square error of this 
estimate have been computed with the aid of the conditional power function of the t-test 
subsequent to an F-test. 









19. Note on Power of the F Test. Srantey W. Nasu, University of Californias 
Berkeley. 











Assuming ‘‘treatment’’ expectations to be normal random variables, the ratio of the 
sum of squares due to treatments to the sum of squares due to error has a central F dis- 
tribution in the cases of randomized blocks, Latin squares, and one-way classifications. 
The F statistic converges in probability to a constant as the number of treatments is in- 
creased. This is one plus a multiple of the variance between treatment expectations. 
The power of the F test increases monotonely to one as the number of treatments is in- 
creased. This power can be calculated using tables of the incomplete beta function. 













20. Best Asymptotically Normal Estimates. E. W. BaranxIn and J. 
GuRLAND, University of California, Berkeley. 





The methods of minimum x? developed by Neyman for obtaining BAN (best asymp- 
totically normal) estimates of the parameters appearing in the multinomial distribution 







ABSTRACTS OF PAPERS 435 


are generalized to obtain certain optimum types of estimates in the case of an arbitrary 
distribution under certain restrictions. Let the random vector X have the probability 
density v(x; @) in the absolutely continuous case and let v(z; 6) = P{X = x/6} in the discrete 
case, where @ is a fixed vector in the parameter space. Functions ¢;(X), (¢ = 1,2,---, 7) 
are selected for the purpose of forming estimates; these estimates are taken to be functions 
of the sample moments “ iis , ¢i(z;). Certain quadratic forms which depend on the 
choice of functions ¢:1(X), ¢2(X), --- , or(X) are minimized with respect to the parameters. 
In this manner, asymptotically normal estimates are obtained which are consistent, and 
have minimum asymptotic variances within the class of estimates so determined by the 
particular functions ¢1, $2, -*: , ¢r. It is possible, through a modification of this proce- 
dure, to obtain estimates by solving a set of linear equations. If v(z; 4) has the form 


& 
v(x; 0) = exp {80(8) + >, Bi(8)yi(x) + yo(x)} 
t=1 


it can be shown that the best choice of the ¢’s is y:(z), y2(x), +++ , ye(z). Maximum likeli- 
hood estimates belong to this class of BAN estimates. 




























BOOK REVIEW 


The Theory of Games and Economic Behavior John von Neumann and Oskar 
Morgenstern. Princeton University Press, 1947; Second Edition, Pp. xviii, 
641. $10.00 


REVIEWED BY LEONID Hurwicz! 


Iowa State College 


This review is devoted to the second edition of a book which from its first 
appearance was acknowledged to be a major contribution in the field of theory 
of rational behavior. As is pointed out in the Preface, “the second edition 
differs from the first in some minor respects only’. The main change is the 
addition of a proof (of ‘‘measurability”’ of utility) omitted in the first edition. 

The book’s objective is to solve the problem of rational behavior in a very 
general type of situation. 

It is, therefore, not surprising that its results are of relevance in many fields 
of knowledge, among them economics and statistical inference. 

In both economics and statistics the problem of rational behavior is a funda- 
mental one. Thus one of the classical problems treated by the economic theory 
is that of profit maximization by a firm. The firm is assumed to be maximizing 
its net profit which is a function of prices of the product, materials used, etc., as 
well as the quantities used and produced. In the simplest case prices are taken 
as given; more generally they are assumed to be functions (known to the firm) 
of the quantities sold and purchased. But assuming this function to be known 
presupposes the knowledge of behavior of other firms. This procedure has for 
a long time been regarded as highly unsatisfactory; it is analogous to elaborating 
the theory of rational behavior of a poker player on the assumption that he knows 
the strategy of the other players! 

It is the type of situation in which not only the behavior of various individ- 
uals, but even their strategies, are interdependent, that is treated by von Neu- 
mann and Morgenstern. The essence of their solutions is to base the optimal 
strategy on the minimax principle. As applied to a game, the principle re- 
quires that one should choose a strategy which minimizes the maximum loss 
that could be inflicted by the opponent. 

The minimax principle, when applied by both players need not, in general, 
lead to a stable solution. To ensure the existence of such a solution the authors 
are led to the postulate that the choice of strategies be made through a random 
process. The minimax to be found is that of the mathematical expectation of 
the loss in the game. The latter postulate is of a restrictive nature? since it 
implies that the game is played for numerical (‘“‘measurable”’) stakes and that 





1QOn leave to the United Nations Economic Commission for Europe. 
2See Jacob C. Marschak, “Neumann’s and Morgenstern’s New Approach to Static 
Economics”, Ihe Journal of Political Economy, Vol. LIV (1946). 
, Y; 


436 








kar 
viii, 


irst 
ory 
ion 
the 


ery 


Ids 


ory 
ing 
as 
cen 
m) 
wn 
for 
ing 
WS 


BOOK REVIEW 437 


the second and higher moments of the probability distribution of the losses are 
immaterial. This restriction, however, has permitted the authors to go deeper 
in other directions. Given the great complexity of the problem, even in its 
restricted version, the authors’ decision can hardly be criticized. One could 
only wish that similar considerations had made the authors more tolerant towards 
other work in the field of economics than is shown in some sections of the book. 

The readers of the Annals will be particularly interested in the connection 
between the Theory of Games and the theory of statistical inference. 

As has been pointed out by Abraham Wald? the problem faced by the statisti- 
cian is somewhat similar to that of a player in a game of strategy. The theory 
of statistical inference may be viewed as a theory of rational behavior of the 
statistician. His “strategy” consists in adopting an optimal test or estimate, 
more generally an optimal decision function. This optimal decision function 
must be chosen without the knowledge of the ‘‘a priori” distribution of the pop- 
ulation parameters. Wald’s basic postulate of minimization of maximum risk 
is equivalent to regarding the statistician as a player in a game of strategy, with 
“Nature” as the other player. The optimal decision function is chosen in a 
way which (as shown by Wald) is equivalent to assuming the ‘“‘least favorable”’ 
a priori distribution of the parameters. As Wald says, ‘“‘we cannot say that 
Nature wants to maximize [the statistician’s risk]. However, if the statistician 
is completely ignorant as to Nature’s choice, it is perhaps not unreasonable to 
base the theory of a proper choice of [the decision function] on the assumption 
that Nature wants to maximize (the statistician’s risk)’’. 

It may be noted, however, that statistical inference, as seen by Wald, is a 
relatively simple game since it involves only two players and is of the zero-sum 
variety. 

The admiring and enthusiastic reception given to the book’s first edition would 
make any further general appraisal somewhat anticlimatic. Suffice it to say 
that a good deal of valuable work has already been stimulated by the Theory of 
Games, both in the field of social sciences and in mathematics. 


3Abraham Wald, “Statistical Decision Functions which Minimize the Maximum Risk’’, 
Annals of Mathematics, Vol. 46, (1945). 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


Dr. Paul H. Anderson, formerly an Economist with the War Assets Adminis- 
tration, Washington, D. C., has been appointed Professor of Marketing at 
Loyola University, New Orleans, Louisiana. 

Mr. N. H. Carrier has resigned his position with the Mathematical Statistics 
Section, Chief Scientific Advisers Division, Ministry of Works, England to ac- 
cept an appointment as Statistician in the General Register Office, Somerset 
House, Strand, London, W. C. 2, England. 

Dr. T. Freeman Cope has been promoted to a full professorship at Queens 
College, Flushing, New York. 

Dr. Wayne W. Gutzman, who was formerly at the Postgraduate School, 
Naval Academy, Annapolis as an Assistant Professor, has accepted a professor- 
ship in the Department of Mathematics, University of South Dakota. 

Mr. Elvin A. Hoy has transferred from the position as Chief, Statistics Sec- 
tion, Bureau of Research and Statistics in the Social Security Administration to 
the position as Chief, Research Evaluation Section, Naval Reserve Training 
Publications, Navy Department, Naval Gun Factory, Washington, D. C. 

Dr. Joe J. Livers has been promoted to a full professorship at Montana State 
College, Bozeman, Montana. 

Professor Ernest 8. Keeping has returned to his position at the University of 
Alberta, Edmonton, Alberta, Canada after having spent the spring term of 1948 
at the Institute of North Carolina. 

Mr. Wharton F. Keppler of the M&R Dietetic Laboratories, Inc., Columbus, 
Ohio has recently qualified as a Professional Industrial Engineer in the State of 
Ohio. 

Mr. Ralph Mansfield has formed his own company to manufacture electrical 
testing equipment. The company is known as the Auto-Test, Incorporated, 
with Mr. Mansfield acting as Vice-president and Chief Engineer. 

Mr. Jack Moshman has resigned an instructorship in mathematics at the 
University of Tennessee to accept the post of Statistician to the Medical Advisor, 
United States Atomic Energy Commission, Oak Ridge, Tennessee. 

Mr. Bernard E. Phillips has resigned his position as teacher of mathematics 
in the Newark, New Jersey high schools to do statistical work for the Glenn L. 
Martin Co., Baltimore, Maryland. 

Dr. W. R. Van Voorhis, Associate Professor of Mathematics, Fenn College, 
attended, as a representative of the Institute of Mathematical Statistics, the 
inauguration ceremonies of Dr. Keith Glennan as President of Case Institute of 
Technology, Cleveland, Ohio. 


438 





NEWS AND NOTICES 


Atomic Energy Commission Fellowships 


The National Research Council is announcing a new program of fellowships 
supported by funds provided by the Atomic Energy Commission as a part of the 
Commission’s responsibility for future atomic energy research. Accordingly, 
these fellowships will be awarded in the many fields of science related to research 
in atomic energy. 

A considerable number of these fellowships is available to young men and 
women who wish to continue in graduate training or research for the doctorate 
in an appropriate field of science. Others of these fellowships will provide train- 
ing in biophysics applied to the control of radiation hazards. An additional 
number of fellowships will be assigned to those below the age of 35 who have 
already achieved the doctorate and who wish to secure advanced research train- 
ing and experience in those aspects of the physical, biological and medical 
sciences related to atomic energy. Tenure of the fellowship does not impose on 
the fellow any commitment with regard to subsequent employment. 

The candidates will be selected by the fellowship boards of the National Re- 
search Council established for this program. In the postdoctoral field, there 
will be three groups of fellowships, the basic stipend of which will be $3000. For 
the selection of fellows for advanced research and training in the general field of 
the physical sciences, a board has been established with Dr. Roger Adams, 
Professor of Chemistry, University of Illinois, as chairman. In the general 
field of the biological sciences, exclusive of the medical sciences, selections of 
postdoctoral fellows will be made by a board under the chairmanship of Dr. R. 
G. Gustavson, Chancellor of the University of Nebraska. For the selection of 
postdoctoral fellows in the medical sciences, a board has been set up under the 
chairmanship of Dr. Homer W. Smith, Professor of Physiology, College of 
Medicine, New York University. 

The program provides for two groups of fellows in the predoctoral field, with 
stipends ranging from $1500-2100. One group of fellows will work in the bio- 
logical and basic medical sciences including applied biophysics related to the 
measurement and control of radiation hazards and the effect of radiation upon 
health. Selections will be made by a board under the chairmanship of Dr. 
Douglas Whitaker, Professor of Biology, and Dean of the School of Biological 
Sciences, Stanford University. Another group of predoctoral fellows will be 
selected for study and research in the general field of the physical sciences. 
Selections will be made by a board under the chairmanship of Dr. Henry A. 
Barton, Director of the American Institute of Physics. 

Fellowships will be granted for study and research in universities or other 
nonprofit research establishments approved by the fellowship boards. Awards 
will be made for the academic year 1948-49. Supervision of a fellow’s program 
of work will be under the direction of the fellowship boards of the National 
Research Council. Further information can be secured by writing to the 
Fellowship Office, National Research Council, 2101 Constitution Avenue, Wash- 
ington 25, D. C. 





NEWS AND NOTICES 


Research Fellowships in Psychometrics 


The Educational Testing Service, Princeton, N. J., is offering for 1949-50 its 
second series of research fellowships in psychometrics leading to the Ph.D. 
degree at Princeton University. Open to men who are acceptable to the Gradu- 
ate School of the University, the two fellowships carry a stipend of $2,200 a 
year and are normally renewable. 

Fellows will be engaged in part-time research in the general area of psycho- 
logical measurement at the offices of the Educational Testing Service and will, 
in addition, carry a normal program of studies in the Graduate School. Com- 
petence in mathematics and psychology is a prerequisite for obtaining these 
fellowships. Information and application blanks may be obtained from: 
Director of Psychometric Fellowship Program, Educational Testing Service, 
Box 592, Princeton, N. J. 


a a 


Preliminary Actuarial Examinations 
Prize Awards 


The winners of the prize awards offered by the Actuarial Society of America 
and the American Institute of Actuaries to the nine undergraduates ranking 
highest on the combined score on Part 1 and Part 2 of the 1948 Preliminary 
Actuarial Examinations are as follows: 

First Prize of $200 

Edward H. Larson Massachusetts Institute of Technology 

Additional Prizes of $100 

Haverford College 

University of Alabama 
Joseph P. Fennell Princeton University 
Bert F. Green, Jr Yale University 
Solomon Leader .. Rutgers University 
Felix A. E. Pirani University of Western Ontario 
tichard J. Semple.......cic2..00 0000020000000. neversity of Terento 
Charles A. Yardley............ Dartmouth College 


The two actuarial organizations have authorized a similar set of nine prize 
awards for the 1949 Examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exam- 
inations: 


Part 1. Language Aptitude Examination. 
(Reading comprehension, meaning of words and word relationships, antonyms, and 
verbal reasoning.) 

Part 2. General Mathematics Examination. 
(Algebra, trigonometry, coordinate geometry, differential and integral calculus.) 


Part 3. Special Mathematics Examination. 
(Finite differences, probability and statistics.) 





NEWS AND NOTICES 441 


The 1949 Preliminary actuarial Examinations will be administered by the 
Educational Testing Service at centers throughout the United States and 
Canada on May 13-14, 1949. The closing date for applications is March 15, 
1949. 

Detailed information concerning the Examinations can be obtained from either 
of the following organizations: 


American Institute of Actuaries, 
135 South LaSalle Street, 
Chicago 3, Illinois. 


The Actuarial Society of America, 
393 Seventh Avenue, 
New York 1, New York. 


(a ne 


New Members 
The following persons have been elected to membership in the Institute 
(March 1 to May 31, 1948) 


Alder, Arthur, Ph.D. (Univ. of Berne) Professor of Actuarial Science, University of Berne, 
Schlaeflistrasse 2, Berne, Switzerland. 

Andrews, Fred C., B.S. (Univ. of Washington) Research Fellow, Department of Mathe- 
matics, University of Washington, 1/41 Savery Hall, University of Washington, Seattle, 
Washington. 

Archer, John, Actuary, Pensions Section, Lever Brothers and Unilever Ltd., 5A Spencer 
Hill, Wimbledon, S. W. 19, England. 

Benitz, Paul A., M.A. (Stanford Univ.) 173 Serpentine Road, Tenafly, New Jersey. 

Bennett, George K., Ph.D. (Yale) President of the Psychological Corporation, 522 Fifth 
Avenue, New York 18, New York. 

Berrettoni, J. N., Ph.D. (Univ. of Minnesota) Professional Consultant in Statistics and 
Economics, 632 Erie St., S. E., Minneapolis 14, Minnesota. 

Birnbaum, Allan, A.B. (Univ. of Calif., Los Angeles) Teaching Assistant, Mathematical 
Statistics Department, Columbia University, 500 Riverside Drive, Room 484, New 
York 27, New York. 

Blank, Mark, M.A. (Univ. of Pennsylvania) Instructor of Philosophy, University of Penn- 
sylvania, 223 L. Sedgwick, Philadelphia, Pa. 

Blumen, Isadore, B.A. (Univ. of Minn.) Student, Department of Mathematical Statistics, 
University of North Carolina, Chapel Hill, North Carolina. 

Burdick, Wayne E., M.A. (Univ. of Mich.) Student, University of Michigan, 314 S. Fifth 
Avenue, Ann Arbor, Michigan. 

Chaturvedi, Jagdish C., M.Sc. (Agra Univ., India) Lecturer in Statistics, St. John’s College, 
37, Delhi Gate, Agra, U-.P., India. 

Cote, Louis J., A.M. (Univ. of Mich.) Student, University of Michigan, 3/5 North State 
Street, Ann Arbor, Michigan. 

Dunleavy, Mary, A.B. (Hunter College, New York) Statistician, E. I. Dupont de Nemours, 
657 Second Avenue, New York 16, New York. 

Ferber, Robert, M.A. (Univ. of Chicago) Student, University of Chicago, 54 West 89th Street, 
New York 24, New York. 

Forman, John W., M.S. (Univ. of Iowa) Graduate Assistant, Department of Mathematics. 
State University of Iowa, Iowa City, Iowa. 











442 NEWS AND NOTICES 





Franklin, Nathan M., M.S. (Univ. of Mich.) Student, Univ. of Michigan, Box 195, Moodus, 
Connecticut. 

Fraser, Donald A. S., M.A. (Univ. of Toronto) Instructor in Statistics, Graduate College, 
Princeton, New Jersey. 

Grabowski, Edwin F., A.B. (George Washington Univ.) Student, George Washington Uni- 
versity, 1830-30th Street, N.W., Washington, D. C. 

Healy, William C., Jr., B.S.E. (Univ. of Mich.) Student, University of Michigan, 589 Lin- 
coln, Grosse Pointe, Michigan. 

Heimdahl, Olaf E. W., A.B. (Luther College, Washington) Teaching Fellow, Department of 
Mathematics, University of Washington, 4286 Union Bay Lane, Seattle 5, Washington. 

Henriksen, Robert O., B.Sc. (Univ. of Mich.) Student, University of Michigan, 751 Clancy 
Avenue, Grand Rapids, Michigan. 

Howard, William G., B.S. (Western Carolina Teachers College, Cullowhee, N.C.) Student, 
Institute of Statistics, University of North Carolina, Route 1, Morrisville, North 
Carolina. 

Irick, Paul E., M.S. (Purdue Univ.) Mathematics Instructor, Purdue University, 729 
North Grant St., West Lafayette, Indiana. 

Johnson, Elgy S., M.A. (Univ. of Mich.) Student, University of Michigan, 13907 Lincoln 
Street, Detroit 3, Michigan. 

Kaplan, E. L., B.S. (Carnegie Inst. of Tech.) Mathematician, Naval Ordnance Laboratory, 
1427 N. St., N. W., Washington 5, D.C. 

Kaufman, Arthur, M.A. (Columbia Univ.) Student and Lecturer of Mathematics, Columbia 
University, 1280 Sheridan Avenue, New York 56, New York. 

Link, Richard F., B.S. (Univ. of Oregon) 750 W. Sixth St., Eugene, Oregon. 

Marks, Charles L., M.A. (Univ. of North Carolina) Instructor of Mathematics, University 
of North Carolina, 213 Mangum Dormitory, University of North Carolina, Chapel Hill, 
North Carolina. 

Marquardt, Mary, M.A. (Univ. of Illinois) Assistant Professor of Statistics, New York State 
School of Industrial and Labor Relations, Cornell University, Ithaca, New York. 

Mickey, Max R., Jr., B.S. (Virginia Polytechnic Institute) Graduate Student and Graduate 
Assistant, Department of Mathematics, Iowa State College, 706 Ash Avenue, Ames, 
Towa. 

Mindlin, Albert, B.A. (Univ. of California, Los Angeles) Teaching Assistant, Mathematics 
Department, University of California, 2444 Carlston Street, Berkeley 4, California. 

Morris, William S., A.B. (Princeton) Statistician, First Boston Corporation, 100 Broadway, 
New York 5, New York. 

Netzorg, Morton J., Engineer, Development Tire Engineering Department, U. S. Rubber 
Co., Detroit, Michigan, 2523 Gladstone, Detroit 6, Michigan. 

Norton, James A., Jr., A.B. (Antioch College) Graduate Research Assistant, Veterans 
Guidance Center, Purdue University, West Lafayette, Indiana. 

Perrin, John K., A.B. (Columbia College) Assistant Statistician, American Telephone & 
Telegraph Co., 195 Broadway, New York 7, New York. 

Perry, Norman C., M.A. (Univ. of Southern Calif.) Lecturer in Mathematics, Mathematics 
Department, University of Southern California, Los Angeles, California. 

Powell, Charles Jr., Actuary, Coates and Herfurth, Consulting Actuaries, 116 S. Virgil 
Avenue, Los Angeles 4, California. 

Raiffa, Howard, B.S. (Univ. of Mich.) Student, University of Michigan, 1447 Enfield Court, 
Willow Run Village, Michigan. 

Raup, Joan E., B.A. (Barnard College) Statistical Analyst, Bureau of the Budget, 1436 N. 
Street, N. W., Washington 6, D.C. 

Rubinstein, David, B.S. (Univ. of Wash.) Research Assistant, Statistical Laboratory, Uni- 

versity of California, 2216 Parker Street, Berkeley 4, California. 





NEWS AND NOTICES 443 


Schlenz, John W., B.S. (Baldwin-Wallace College) Student, University of Michigan, 8806 
Vineyard Avenue, Cleveland &, Ohio. 

Scott, Elizabeth L., A.B. (Univ. of California) Research Assistant, Statistical Laboratory, 
Department of Mathematics, University of California, Berkeley 4, California. 
Seidman, Herbert, B.A. (Brooklyn College) Junior Statistician, Chief, Statistical Informa- 
tion Section, New York University and Student, New York University, 2170 New 

York Avenue, Brooklyn iv, New York. 

Shaw, Oliver A., B.A. (Univ. of Mississippi) U.S. Air Force, 6481 Brooks Lane, N. W., 
Washington, D.C. 

Shellard, Gordon D., B.S. (Mass. Institute of Tech.) Assistant Section Head, Underwriting 
Studies Section, Actuarial Division, Metropolitan Life Insurance Co., 420 Mountain 
Avenue, Ridgewood, New Jersey. 

Shepherd, Clarence M., M.S. (Case Institute of Tech.) Electrochemical Research Chemist, 
$959 Nichols Avenue, S.W., Washington, D.C. 

Shrikhande, Sharad-Chandra S., B.Sc. (Nagpur Univ., India) Graduate student, Depart- 
ment of Mathematical Statistics, University of North Carolina, Chapel Hill, North 
Carolina. 

Sirlin, Robert, M.A. (Columbia Univ.) Statistician, Financial Analysis, 2046 East 28rd 
Street, Brooklyn 29, New York. 

Stacy, Edney W., A.B. (Univ. of North Carolina) Instructor of Mathematics, University 
of North Carolina, 301 W. Franklin Street, Chapel Hill, North Carolina. 

Sternhell, Charles M., B.S. (College of City of N. Y.) Section Head, Actuarial Division, 
Metropolitan Life Insurance Co., 1 Madison Avenue, New York City, New York. 

Tang, Pei-Ching, Ph.D. (Univ. College, London Univ.) Professor, National Central Uni- 
versity, Nanking, China. 

Whitson, Milo E., A.M. (Geo. Peabody College, Nashville) Head of Mathematics Depart- 
ment, California State Polytechnic College, 523 Lawrence Dr., San Luis Obispo, 
California. 

Watson, Geoffrey S., B.A. (Univ. of Melbourne) Student, Institute of Statistics, State 
College, Raleigh, North Carolina. 

Woolsey, Theodore D., B.A. (Yale Univ.) Biostatistician, Division of Public Health Meth- 
ods, U.S. Public Health Service, 111 West Underwood St., Chevy Chase 15, Maryland. 

Wymer, John P., M.A. (Univ. of California, Berkeley) Statistician, U. S. Bureau of Labor 
Statistics, 719 Whittier St., N.W., Washington 12, D.C. 

Yerushalmy, Jacob, Ph.D. (Johns Hopkins Univ.) Professor of Biostatistics, School of 
Public Health, University of California, Berkeley 4, California. 













REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 





The Thirty-fourth Meeting and the Third Regional West Coast Meeting of 
the Institute of Mathematical Statistics was held on the Berkeley Campus of the 
University of California June 22nd through June 24th, 1948, in conjunction 
with the Twenty-ninth Annual Meeting of the Pacific Division of the American 
Association for the Advancement of Science. During the meeting 115 persons 
registered, including the following members of the Institute: 


G. A. Baker, Blair M. Bennett, Carl A. Bennett, Z. Wm. Birnbaum, David Blackwell, 
Albert H. Bowker, George W. Brown, A. George Carlton, Douglas G. Chapman, Andrew G, 
Clark, Edwin L. Crow, Dorothy Cruden, Harold Davis, R. C. Davis, W. J. Dixon, Robert 
Dorfman, George Eldredge, Lillian Elveback, Mary Elveback, Benjamin Epstein, M. W. 
Eudey, Evelyn Fix, Merrill M. Flood, H. H. Germond, Meyer A. Girshick, Eugene L. 
Grant, John Gurland, T. FE. Harris, J. L. Hodges, Jr., Paul G. Hoel, John M. Howell, Harry 
M. Hughes, Leo Katz, H.S. Konijn, T. C. Koopmans, George W. Kuznets, E. L. Lehmann, 
Richard F. Link, A. M. Mood, Stanley W. Nash, J. Neyman, Stefan Peters, G. Baley Price, 
Kathryn B. Rolfe, Leonard J. Savage, Henry Scheffé, Howard L. Schug, Elizabeth L. 
Scott, Esther Seiden, Milton Sobel, Zenon Szatrowski, John W. Tukey, J. R. Vatnsdal, 
A. Wald, John E. Walsh, Holbrook Working, Zivia S. Wurtele. 

















The Tuesday morning session was devoted to a symposium on Mathematical 
Theory of Games with Professor G. C. Evans of the University of California, 
3erkeley, as chairman. Addresses were: 
















1. Survey of von Neumann’s mathematical theory of games. 
Rand. 

Recent developments in the mathematical theory of games. Olaf Helmerz, Project Rand. 
Applications of theory of games to statistics. Abraham Wald, Columbia University. 
On continuous games. Henri F. Bohnenblust, California Institute of Technology. 
Discussion. Edward W. Barankin, University of California, Berkeley. 


J.C. C. MeKinsey, Project 


or ke WwW bo 









The attendance was approximately 130. 

The Tuesday afternoon session was under the chairmanship of Professor Henri 
F. Bohnenblust of the California Institute of Technology. The invited address, 
Complete Classes of Statistical Decision Functions, by Professor Abraham Wald 
was followed by tea in Senior Women’s Hall and then the following contributed 
papers: 
















1. Identification as a problem of inference. T. C. Koopmans, Cowles Commission for 
Research in Economics. 
Discussion: Olav Reiersol, University of Oslo. 
. Estimation of parameters for truncated multinormal distributions. Z. W. Birnbaum, 
E. Paulson and F. C. Andrews, University of Washington. 
3. A test of the hypothesis that a sample of three came from the same normal distribution. 
Carl A. Bennett, General Electric Company. 
4. A Note on the application of the abbreviated Doolittle solution to nonorthogonal analysis 
of variance and covariance. (By title.) Carl A. Bennett, General Electric Company, 


to 








The attendance was between 100 and 150 during the afternoon. 
444 


REPORT ON BERKELEY MEETING 445 


The Wednesday morning session was devoted to a symposium on Design of 
Experiments with Particular Reference to Agricultural Trials. Dean A. R. 
Davis of the University of California, Berkeley, presided briefly and then 
Professor Abraham Wald took over the duties of chairman. The papers were: 


. Recent advances in experimental design. R.C. Bose, University of Calcutta. 


2. Yield trials with backcrossed derived lines of wheat. G. A. Baker and F. N. Briggs, 


University of California at Davis. 


3. Selecting subset which includes the largest of a number of means. Charles Stein, Uni- 


versity of California, Berkeley. 


4. Discussion. A. G. Clark, Colorado State College; S. W. Nash, University of Cali- 


fornia, Berkeley; J. R. Vatnsdal, State College of Washington. 


5. The effect of inbreeding on height at withers in a herd of Jersey cattle. W.C. Rollins, 


S. W. Mead and W. M. Regan, University of California at Davis. 


Attendance was about 100. 

The afternoon session, under the chairmanship of Professor George Pélya of 
Stanford University, began with an invited address by Professor Michel Loéve, 
University of California, Berkeley, on Random Functions and Related Problems. 
This was followed by the contributed papers: 


i. 


An example of a singular continuous distribution. (By title.) Henry Scheffé, Uni- 
versity of California at Los Angeles. 


2. On the theory of some nonparametric hypotheses. EK. L. Lehmann and Charles Stein, 


University of California, Berkeley. 


3. Compound randomization in the binary system. John E. Walsh, Project Rand. 
4. A multiple decision problem arising in the analysis of variance. Edward Paulson, 


University of Washington. 


. Recurrence formulae for the moments and seminvariants of the joint distribution of the 


sample mean and variance. Olav Reiersgl, University of Oslo. 


. Identification problem in factor analysis. (By title.) Olav Reiersgl, University of 


Oslo. 


. On distinct hypotheses. Mrs. Agnes Berger, Columbia University. 


The attendance was approximately 100. 
A symposium on Sampling for Industrial Use occupied the Thursday morning 
session. Professor Z. W. Birnbaum of the University of Washington presided. 


rR 


Sampling plans for continuous production. M. A. Girshick, Project Rand. 


2. Sampling plans with continuous variables for acceptance inspection. A. L. Bowker, 


Stanford University. 


3. Place of statistical sampling in the education of engineers. E. L. Grant, Stanford 


University. 


/ 
4. Discussion. Henry Scheffé, University of California at Los Angeles; Charles Stein, 


University of California, Berkeley; Holbrook Working, Stanford University. 


The attendance was approximately 100. 
The first part of the afternoon session, presided over by Professor W. J. Dixon, 
University of Oregon, was devoted to contributed papers: 


it. 


Statistical problems of medical diagnosis. Jerzy Neyman, University of California, 
Berkeley. 


Discussion: L. J. Savage, University of Chicago. 





REPORT ON BERKELEY MEETING 


2. Power of certain tests relating to medical diagnosis. C. L. Chiang and J. L. Hodges, 
University of California, Berkeley. 
. On best asymptotically normal estimates. Edward W. Barankin and John Gurland, 
University of California, Berkeley. 
. Iterative treatment of continuous birth processes. T. EK. Harris, Project Rand. 
. Estimation of means on the basis of preliminary tests of significance. Blair M. Bennett, 
University of California, Berkeley. (By title.) 


The attendance was about 90. 

The second part of the afternoon session was the Business Meeting. Professor 
Abraham Wald, President of the Institute, presided. It was recommended that 
two meetings a year be held on the West Coast, one in June in the San Francisco 
Bay Area alternating between Berkeley and Stanford and the other during the 
winter alternating between the North and Los Angeles. The next West Coast 
meeting will be held during the Thanksgiving recess at Seattle. 








